A Smart Movie Suitability Rating System Based on Subtitle

A Smart Movie Suitability Rating System Based on Subtitle

With the enormous growth rate in the number of movies coming into our lives, it can be very challenging to decide whether a movie is suitable for a family or not. Almost every country has a Movie Rating System that determines movies’ suitability age. But these current movie rating systems require watching the full movie with a professional. In this paper, we developed a model which can determine the rating level of the movie by only using its subtitle without any professional interfere. To convert the text data to numbers, we use TF-IDF vectorizer, WIDF vectorizer and Glasgow Weighting Scheme. We utilized random forest, support vector machine, k-nearest neighbor and multinomial naive bayes to find the best combination that achieves the highest results. We achieved an accuracy of 85%. The result of our classification approach is promising and can be used by the movie rating committee for pre-evaluation. Cautionary Note: In some chapters of this paper may contain some words that many will find offensive or inappropriateness; however, this cannot be avoided owing to the nature of the work

___

  • Park SB, Kim HN, Kim H, Jo GS "Exploiting script-subtitles alignment to scene boundary dectection in movie". 2010 IEEE International Symposium on Multimedia, Taichung, Taiwan, 13-15 December 2010.
  • Katsiouli P, Tsetsos V, Hadjiefthymiades S. "Semantic Video Classification Based on Subtitles and Domain Terminologies". KAMC 2007 Workshop on Knowledge Acquisition from Multimedia Content, Genoa, Italy, 5 December 2007.
  • Lison P, Meena R. "Automatic turn segmentation for movie & tv subtitles". 2016 IEEE Spoken Language Technology Workshop (SLT), San Juan, Porto Riko, 13-16 December 2016.
  • Vajjala S, Meurers D. "Exploring measures of 'readability' for spoken language: Analyzing linguistic features of subtitles to identify age-specific tv programs", 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR), Gothenburg, Sweden, 27 April 2014.
  • von Boguszewski N, Moin S, Bhowmick A, Yimam SM, Biemann C. "How Hateful are Movies? A Study and Prediction on Movie Subtitles". arXiv preprint, 2108.10724(1), 2021.
  • Hesham M, Hani B, Fouad N, Amer E. "Smart trailer: Automatic generation of movie trailer using only subtitles", IEEE 2018 First International Workshop on Deep and Representation Learning (IWDRL), Cairo, Egypt, 29-29 March 2018.
  • Bougiatiotis K, Giannakopoulos T. "Content representation and similarity of movies based on topic extraction from subtitles", 9th Hellenic Conference on Artificial Intelligence, Thessaloniki, Greece, 18-20 May 2016.
  • Scaiano M, Inkpen D, Laganiere R, Reinhartz A. "Automatic text segmentation for movie subtitles", 23rd Canadian Conference on Artificial Intelligence, Ottawa, Canada, 31 May - 2 June 2010.
  • Li Y, Rizzo G, Redondo García JL, Troncy R, Wald M, Wills G. "Enriching media fragments with named entities for video classification", 22nd International Conference on World Wide Web (WWW13), Rio de Janeiro Brazil, 13 – 17 May 2013.
  • Jenkins L, Webb T, Browne N, Afifi AA, Kraus J. "An evaluation of the motion picture association of america’s treatment of violence in pg-, pg-13–, and r-rated films", American Academy of Pediatrics, 115(5), 512-517, 2005.
  • Park SB, Oh KJ, Kim HN, Jo GS. "Automatic subtitles localization through speaker identification in multimedia system". 2008 IEEE International Workshop on Semantic Computing and Applications, Incheon, South Korea, 10-11 July 2008.
  • Agarwal R. "Video Classification into Academic and Entertainment using Subtitles", Turkish Journal of Computer and Mathematics Education (TURCOMAT), 12(11), 5633-5639, 2021.
  • Lee AS, Oh H, Seo M. "ViSeRet: A simple yet effective approach to moment retrieval via fine-grained video segmentation", arXiv preprint, 2110.05146(2), 2021.
  • Abdulhussain SH, Al-Haddad SAR, Saripan MI, Mahmmod BM, Hussien A. "Fast temporal video segmentation based on krawtchouk-tchebichef moments". Institute Electrical And Electronics Engineers, 8, 72347-72359, 2020.
  • Lison P, Doğruöz AS. "Detecting machine-translated subtitles in large parallel corpora", 11th Workshop on Building and Using Comparable Corpora (BUCC 2018), Miyzaki, Japan, 8 May 2018.
  • Saz O, Deena S, Doulaty M, Hasan M, Khaliq B, Milner R, Ng RWM, Olcoz J, Hain, T. "Lightly supervised alignment of subtitles on multi-genre broadcasts". Multimedia Tools and Applications, 77(23), 30533-30550, 2018.
  • Topal K, Ozsoyoglu G. "Emotional classification and visualization of movies based on their IMDb reviews", Information Discovery and Delivery, 45(3), 149-158, 2017.
  • Kumar HM, Harish BS, Darshan HK. "Sentiment Analysis on IMDb Movie Reviews Using Hybrid Feature Extraction Method", International Journal of Interactive Multimedia & Artificial Intelligence, 5(5), 109-114, 2019.
  • Dhir R, Raj A. "Movie success prediction using machine learning algorithms and their comparison", 2018 First International Conference on Secure Cyber Computing and Communication (ICSCCC), jalandhar, india, 15-17 December 2018.
  • Baugher D, Ramos C. "The Cross-Platform Consistency of Online User Movie Ratings", Atlantic Marketing Journal, 5(3), 9, 2017.
  • Tiedemann J. "Finding alternative translations in a large corpus of movie subtitle", 10th International Conference on Language Resources and Evaluation (LREC'16), Portorož, Slovenia, 23-28 May 2016.
  • OpenSubtitles.org, "Subtitles", https://www.opensubtitles.org, (31.03.2022).
  • Motion Picture Association Inc, "The Voluntary Movie Rating System: How the Ratings Are Decided", https://www.motionpictures.org/film-ratings, (31.03.2022).
  • Mangeot M, Giguet E. "Multilingual aligned corpora from movie subtitles", Information and Knowledge Processing Laboratory (LISTIC), 1, 6-14, 2005.
  • Dadgar SMH, Araghi MS, Farahani MM. "A novel text mining approach based on TF-IDF and Support Vector Machine for news classification", 2016 IEEE International Conference on Engineering and Technology (ICETECH), tamil nadu india, 17-18 March 2016.
  • Durahim AO, Setirek AC, Özel BB, Kebapci H. "Music emotion classification for Turkish songs using lyrics", Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 24(2), 292-301, 2018.
  • Brigadoi I, Genre classification using syntactic features. MSc Thesis, Uppsala University, Uppsala, sweden, 2021.
  • Shafaei M, Samghabadi NS, Kar S, Solorio T, "Age suitability rating: Predicting the MPAA rating based on movie dialogues", In Proceedings of The 12th Language Resources and Evaluation Conference, Marseille, France, 13-15 May 2020.
  • Khan SU, Haq IU, Rho S, Baik SW, Lee MY, "Cover the violence: A novel Deep-Learning-Based approach towards violence-detection in movies", Applied Sciences, 9(22), 4963, 2019.
  • Shafaei M, Smailis C, Kakadiaris I, Solorio T, "A Case Study of Deep Learning-Based Multi-Modal Methods for Labeling the Presence of Questionable Content in Movie Trailers", International Conference on Recent Advances in Natural Language Processing (RANLP 2021), Online, 1-3 September 2021.
  • Tokunaga T, Makoto I, "Text categorization based on weighted inverse document frequency", In Special Interest Groups and Information Process Society of Japan (SIG-IPSJ), 1994.
  • Sabbah T, Selamat A, Selamat MH, Al-Anzi FS, Viedma EH, Krejcar O, Fujita H, "Modified frequency-based term weighting schemes for text classification", Applied Soft Computing, 58, 193-206, 2017.
  • www.kaggle.com/dataset/e6440f4fb6d17b55e56ee8baffb55d9dc7931560b4b710608db33ab5c29296c7, E.T.: 16.07.2022