Duygu Algılamaya Dayalı Evrişimli Sinir Ağı ile Küçük Resmi Seçimi

Video yayın platformlarının kullanımı her geçen gün artmaktadır. Filmlerin ve dizilerin yayınlanması ve paylaşılması için platformlar geliştirme rekabeti artıyor. Bu platformların çoğaltılmasındaki amaç, kaliteyi arttırmak ve tek bir platform üzerinde takip etmektir. Film ve dizi platformları bu paylaşımlar için yapay zeka algoritmaları kullanır. Bu çalışmanın amacı, bir film veya diziden uygun kareler bularak kullanıcılar için daha çekici kapak fotoğrafları oluşturmaktır. Öncelikle platform üzerinde kapak / küçük resim haline getirilen çerçeveler elde edildi. Kapalı gözler, bulanık çerçeveler veya yüzsüz görüntülerden oluşan gereksiz çerçeveler kaldırıldı. Ayrıca derin öğrenme, görüntüleri yüzün kimliğine göre nesneler ve duygularla etiketlemek için kullanılır. En çok tekrar eden yüzlere sahip küçük resimler, her adımda bir yüz tanıma modeli geliştirilerek seçildi. Deneysel sonuçlar duygu modelinin başarılı olduğunu gösterdi.

Anahtar Kelimeler:

Emotion Detection, Convolutional Neural Network, Video Streaming Platforms, Thumbnail

Thumbnail Selection with Convolutional Neural Network Based on Emotion Detection

The use of video broadcasting platforms is increasing day by day. The competition for developing platforms for the broadcasting and sharing of movies and TV series is increasing. The purpose of reproducing these platforms is to increase the quality and to trace them on a single platform. Film and TV series platforms use artificial intelligence algorithms for these shares. The aim of this study is to create more attractive cover photos for users by finding suitable frames from a movie or TV series. First, the frames that were transformed into covers/small pictures on the platform were obtained. Unnecessary frames which consist of closed eyes, blurry frames, or faceless images have been removed. Also, deep learning is used to label images with objects and emotions based on the identity of the face. The thumbnails with the most repeating faces were selected by developing a face recognition model at each step. The experimental results showed that the emotion model was successful.

Keywords:

Emotion Detection, Video Streaming Platforms, Convolutional Neural Network, Thumbnail,

PDF

___

“Youtube for Press.”, Youtube. Retrieved February 22, 2020 from www.youtube.com/about/press/
Julia Stoll. (2021). Retrieved February 22, 2021 from https://www.statista.com/statistics/250934/quarterly-number-of-netflix-streaming-subscribers-worldwide/#:~:text=Netflix%20had%20203.67%20million%20paid,Netflix's%20total%20global%20subscriber%20base.
Digital Commerce (2021). Retrieved February 15, from https://www.digitalcommerce360.com/article/amazon-prime-membership/#:~:text=Amazon.com%20Inc.%20has%20added,Intelligence%20Research%20Partners%20(CIRP).
Hulu (2021), Retrieved January 30, from https://www.businessofapps.com/data/hulu-statistics/
Zeng, X., Li, W., Zhang, X., & Xu, B. (2008, June). Key-frame extraction using dominant-set clustering. In 2008 IEEE international conference on multimedia and expo (pp. 1285-1288). IEEE.
De Avila, S. E. F., Lopes, A. P. B., da Luz Jr, A., & de Albuquerque Araújo, A. (2011). VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method. Pattern Recognition Letters, 32(1), 56-68.
Zhuang, Y., Rui, Y., Huang, T. S., & Mehrotra, S. (1998, October). Adaptive key frame extraction using unsupervised clustering. In Proceedings 1998 international conference on image processing. icip98 (cat. no. 98cb36269) (Vol. 1, pp. 866-870). IEEE.
Sujatha, C., & Mudenagudi, U. (2011, October). A study on keyframe extraction methods for video summary. In 2011 International Conference on Computational Intelligence and Communication Networks (pp. 73-77). IEEE.
Gharbi, H., Bahroun, S., & Zagrouba, E. (2019). Key frame extraction for video summarization using local description and repeatability graph clustering. Signal, Image and Video Processing, 13(3), 507-515.
Deng, Y., Loy, C. C., & Tang, X. (2017). Image aesthetic assessment: An experimental survey. IEEE Signal Processing Magazine, 34(4), 80-106.
Ma, S., Liu, J., & Wen Chen, C. (2017). A-lamp: Adaptive layout-aware multi-patch deep convolutional neural network for photo aesthetic assessment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4535-4544).
Datta, R., Joshi, D., Li, J., & Wang, J. Z. (2006, May). Studying aesthetics in photographic images using a computational approach. In European conference on computer vision (pp. 288-301). Springer, Berlin, Heidelberg.
Riley, M., Machado, L., Roussabrov, B., Branyen, T., Bhawalkar, P., Jin, E., & Kansara, A. (2018). AVA: The Art and Science of Image Discovery at Netflix. The Netflix Tech Blog. Tsao, C. N., Lou, J. K., & Chen, H. H. (2019, March). Thumbnail image selection for VOD services. In 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR) (pp. 54-59). IEEE.
Yu, Z., & Shi, N. (2020). A Multi-modal Deep Learning Model for Video Thumbnail Selection. arXiv preprint arXiv:2101.00073.
Huang, Y. Y., Kuo, T. Y., & Chen, H. H. (2020, April). Selecting Representative Thumbnail Image and Video Clip from a Video via Bullet Screen. In Companion Proceedings of the Web Conference 2020 (pp. 48-49).
Pretorious, K., & Pillay, N. (2020, July). A Comparative Study of Classifiers for Thumbnail Selection. In 2020 International Joint Conference on Neural Networks (IJCNN) (pp. 1-7). IEEE.
Pech-Pacheco, J. L., Cristóbal, G., Chamorro-Martinez, J., & Fernández-Valdivia, J. (2000, September). Diatom autofocusing in brightfield microscopy: a comparative study. In Proceedings 15th International Conference on Pattern Recognition. ICPR-2000 (Vol. 3, pp. 314-317). IEEE.
Cech, J., & Soukupova, T. (2016). Real-time eye blink detection using facial landmarks. Cent. Mach. Perception, Dep. Cybern. Fac. Electr. Eng. Czech Tech. Univ. Prague, 1-8. Likas, A., & Vlassis, N. (2003). The global k-means clustering algorithm, Pattern Recognit.
Zeiler, M. D., & Fergus, R. (2014, September). Visualizing and understanding convolutional networks. In European conference on computer vision (pp. 818-833). Springer, Cham.
Carrier, P.-L., Courville, A., Goodfellow, I. J., Mirza, M., & Bengio, Y. (2013). FER-2013 Face Database. Technical report, 1365. Universit’e de Montr’eal.
Çakar, M., Yıldız, K., & Demir, Ö. (2020, October). Creating Cover Photos (Thumbnail) for Movies and TV Series with Convolutional Neural Network. In 2020 Innovations in Intelligent Systems and Applications Conference (ASYU) (pp. 1-5). IEEE.
J. Jayalekshmi and T. Mathew, "Facial expression recognition and emotion classification system for sentiment analysis," 2017 International Conference on Networks & Advances in Computational Technologies (NetACT), Thiruvanthapuram, 2017, pp. 1-8, doi: 10.1109/NETACT.2017.8076732.
Gulli, A., & Pal, S. (2017). Deep learning with Keras. Packt Publishing Ltd.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., ... & Ferrari, V. (2020). The open images dataset v4. International Journal of Computer Vision, 1-26. King, D. E. (2009). Dlib-ml: A machine learning toolkit. The Journal of Machine Learning Research, 10, 1755-1758.
The King of comedy (1983, February 18). Retrieved March 15,2021, from https://www.imdb.com/title/tt0085794/