Muhammed Salih TATAR, Rabia KÖK, Aybars UGUR

Turistler İçin, Engelli Bireylere Yönelik Ekler de İçeren, Görüntü Altyazılama Destekli Bilgilendirme ve Öneri Sistemi

Turizm, bireylerin farklı kültürlerle etkileşimi için en önemli araçlardan biridir. Turizm sektörüne önemli bir teknolojik ve sosyal yenilik getirecek, turistler için engelli bireylere yönelik ekler de içeren görüntü altyazılama destekli bilgilendirme ve öneri sistemi işlevlerinden oluşan bir web uygulaması geliştirilmiştir. Bu uygulamada, GPS sistemiyle elde edilen konum bilgisi kullanılarak yakın çevredeki gözde mekanların, tarihi ve turistik bölgelerin turistlere daha efektif bir şekilde ulaştırılması amacıyla İzmir’deki gözde mekanları kapsayan bir veri seti oluşturulmuş ve istatistiksel bir yöntem olan Apriori Algoritması kullanılmıştır. Görme engelli bireylerin şehri keşfetmelerine yardımcı olmak amacıyla kullanıcıdan alınan girdi görüntüsü için transfer öğrenme modellerinden olan VGG16 ve LSTM modelleri ile görüntü altyazısı üretme ve nesne tanıma işlevlerini gerçekleştiren bir Kolaylaştırıcı Modül tasarlanmıştır. Bu modül sayesinde görme engelli bireylerin, şehirdeki sokakların ve nesnelerin anlık görüntülerini metinsel ve işitsel olarak tasvir etmeleri ayrıca daha önceden sisteme girdikleri anahtar kelimelerin (yangın, metro, heykel vs.) görüntüde olma durumunu belirleyebilmeleri sağlanmıştır. MS-COCO, Flicker8k, Flicker30k ve Tourism48 veri setlerindeki görüntülerden oluşturulan veri seti; %80 eğitim, %20 test verisi olarak ayrılmıştır. Yapılan testlerde başarı değerleri, BLEU-1 için %55,41 ve BLEU-2 için ise %30,15 olarak elde edilmiştir.

Anahtar Kelimeler:

Turizm, Derin Öğrenme, Nesne Tanıma, Görüntü Altyazısı Üretme, Öneri Sistemi

Image Caption Generation Supported Information and Recommender System for Tourists, Including Supplements for Individuals With Disabilities

Tourism is one of the most important tools for individuals to interact with different cultures. A web application has been developed that includes information and recommendation system functions supported by image captioning, which will bring significant technological and social innovation to the tourism sector and includes additions for disabled individuals. In this application, a data set covering popular places in Izmir, including nearby popular, historical, and touristic places was created using location information obtained with the GPS system. The Apriori Algorithm, which is a statistical method, was used to deliver popular places more effectively to tourists. A Facilitator Module was designed that performs image captioning and object recognition functions using VGG16 and LSTM models, which are transfer learning models for user input images obtained from visually impaired individuals to help them explore the city. With this module, visually impaired individuals can describe the instant images of streets and objects in the city in textual and auditory form and can also determine whether previously entered keywords (fire, metro, statue, etc.) are present in the image. The data set created from images in MS-COCO, Flicker8k, Flicker30k and Tourism48 data sets was divided into 80% training and 20% test data. Success values were obtained as %55.41 for BLEU-1 and %30.15 for BLEU-2 in the tests conducted.

Keywords:

Tourism, Deep Learning, Object Recognition, Image Captioning, Recommender System,

PDF

___

Bayrak, A. T., Öner, S. C., Gencer, M., Cerit, O. S., Oymagil, A., & Dalva, D. (2022, May). Using word embedding methods for product recommendation. In 2022 30th Signal Processing and Communications Applications Conference (SIU) (pp. 1-4). IEEE.
Bounab, Y., Oussalah, M., & Ferdenache, A. (2020, November). Reconciling image captioning and user’s comments for urban tourism. In 2020 Tenth International Conference on Image Processing Theory, Tools and Applications (IPTA) (pp. 1-6). IEEE.
Burke, R. (2002). Hybrid recommender systems: Survey and experiments. User modeling and user-adapted interaction, 12, 331-370.
Çilingir İ., medium.com, https://medium.com/@iremcilingir/%C3%B6neri-sistemleri-recommendation-systems-28a3f341c0a9, (Erişim Tarihi: 04.06.2023)
Dereli S. M., Veri Bilimi Okulu, https://www.veribilimiokulu.com/oneri-sistemleri-101/, (Erişim Tarihi: 04.06.2023) Ercan, F. (2020). Turizm pazarlamasında yapay zekâ teknolojilerinin kullanımı ve uygulama örnekleri. Ankara Hacı Bayram Veli Üniversitesi Turizm Fakültesi Dergisi, 23(2), 394-410.
Hodosh, M., Young, P., & Hockenmaier, J. (2013). Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artificial Intelligence Research, 47, 853-899.
Kuyu, M., Erdem, A., & Erdem, E. (2018). Altsözcük Ögeleri ile Türkçe Görüntü Altyazılama Image Captioning in Turkish with Subword Units. Küresel Amaçlar, Eşitsizliklerin Azaltılması – Madde:10.2, Madde:10.3, Madde:10.4, https://www.kureselamaclar.org/amaclar/esitsizliklerin-azaltilmasi/, (Erişim Tarihi: 05.06.2023) Küresel Amaçlar, İnsana Yakışır İş ve Ekonomik Büyüme – Madde:8.2, Madde:8.9, https://www.kureselamaclar.org/amaclar/esitsizliklerin-azaltilmasi/, (Erişim Tarihi: 05.06.2023)
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., ... & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13 (pp. 740-755). Springer International Publishing.
Liu, X., Xu, Q., & Wang, N. (2019). A survey on deep neural network-based image captioning. The Visual Computer, 35(3), 445-470.
Plummer, B. A., Wang, L., Cervantes, C. M., Caicedo, J. C., Hockenmaier, J., & Lazebnik, S. (2015). Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In Proceedings of the IEEE international conference on computer vision (pp. 2641-2649).
Sertçelik, Ş., & Önder, E. (2023). Yönetim Bilişim Sistemleri Kapsamında Akademik Araştırma Alanlarının İncelenmesi: Apriori Algoritması ile Bir Analiz. Gümüşhane Üniversitesi Sosyal Bilimler Dergisi, 14(2), 680-690.
Shaikh, F. (2018). Automatic image captioning using deep learning (CNN and LSTM) in PyTorch, Analytics vidhya.
Wang, C., Yang, H., Bartz, C., & Meinel, C. (2016, October). Image captioning with deep bidirectional LSTMs. In Proceedings of the 24th ACM international conference on Multimedia (pp. 988-997).
Zhang, X., Zou, J., He, K., & Sun, J. (2015). Accelerating very deep convolutional networks for classification and detection. IEEE transactions on pattern analysis and machine intelligence, 38(10), 1943-1955.