Tolga HAYIT

Görüntü Kazıma Yoluyla Oluşturulan Örnek Veri Kümesinin Evrişimsel Sinir Ağı Tabanlı Görüntü Sınıflama Üzerine Etkisinin İncelenmesi

Derin öğrenme tabanlı görüntü sınıflandırma çalışmalarının en önemli aşamalarından biri veri elde etme aşamasıdır. Modeli eğitecek veri setinin göreve özgü ve uygun kalitede olması gerekmektedir. Bu nedenle veri setinin oluşturulma süreci araştırmacılar için zahmetli ve yorucu bir süreç olabilmektedir. Web kazıma teknikleri çalışmalarda kullanılabilecek uygun veri setlerinin oluşturulmasında araştırmacılara çözümler sunmaktadır. Özellikle derin öğrenme gibi çok sayıda veri ihtiyacı bulunan görevlerde bu tekniklerin kullanılması süreci ciddi anlamda hızlandırabilmektedir. Bu bağlamda bu çalışma, örnek bir görüntü sınıflandırma görevi için görsel kazıma teknolojisi ile oluşturulan veri setinin sınıflandırmaya başarısını araştırmaktadır. Çalışmada farklı CNN modelleri kullanılarak, oluşturulan örnek veri seti eğitilmiştir. Doğruluk ve diğer performans ölçütleri görsel kazıma yoluyla elde edilen veri setinin görüntü sınıflandırma görevleri için kullanılabileceğini desteklemektedir.

Anahtar Kelimeler:

Görsel kazıma, Web kazıma, Evrişimsel Sinir Ağı, Derin Öğrenme, Görüntü sınıflandırma

PDF

___

[1] R. Diouf, E. N. Sarr, O. Sall, B. Birregah, M. Bousso, and S. N. Mbaye, “Web scraping: state-of-the-art and areas of application,” IEEE International Conference on Big Data (Big Data), 2019, pp. 6040-6042.
[2] R. B. Penman, T. Baldwin and D. Martinez, “Web Scraping Made Simple with SiteScraper,” Citeseer, pp. 1-10.
[3] Wikipedia. Web scraping. (May. 18, 2022). Accessed: May. 18, 2022. [Online]. Available: https://en.wikipedia.org/wiki/Web_scraping
[4] W. Roush. (2012, Jul 25). Diffbot Is Using Computer Vision to Reinvent the Semantic Web. [Online]. Available: https://xconomy.com/san-francisco/2012/07/25/diffbot-is-using-computer-vision-to-reinvent-the-semantic-web/
[5] Pinsent Masons (Out-Law News). Google thumbnails are fair use, says Court of Appeals. (May. 18, 2007). Accessed: May. 18, 2022. [Online]. Available: https://www.pinsentmasons.com/out-law/news/google-thumbnails-are-fair-use-says-court-of-appeals
[6] The Electronic Frontier Foundation (EFF). Perfect 10 v. Google. (May. 16, 2007). Accessed: May. 18, 2022. [Online]. Available: https://www.eff.org/cases/perfect-10-v-google
[7] J. Schultz. (2007, May 16). P10 v. Google: Public Interest Prevails in Digital Copyright Showdown. [Online]. Available: https://www.eff.org/deeplinks/2007/05/p10-v-google-public-interest-prevails-digital-copyright-showdown
[8] Pinsent Masons (Out-Law News). Google image search results do not infringe copyright, says German court. (Apr. 30, 2010). Accessed: May. 18, 2022. [Online]. Available: https://www.pinsentmasons.com/out-law/news/google-image-search-results-do-not-infringe-copyright-says-german-court
[9] E. N. Sarr, S. A. L. L. Ousmane and A. Diallo, “FactExtract: automatic collection and aggregation of articles and journalistic factual claims from online newspaper”. Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS), 2018, pp. 336-341. IEEE.
[10] S. Ashouri et al., “Indicators on firm level innovation activities from web scraped data,” Data in Brief, 108246, 2022.
[11] A. Hajikhani et al., “Connecting firm's web scraped textual content to body of science: Utilizing microsoft academic graph hierarchical topic modeling,” MethodsX, vol. 9, no. 101650, 2022.
[12] U. Baskaran and K. Ramanujam, “Automated scraping of structured data records from health discussion forums using semantic analysis,” Informatics in Medicine Unlocked, vol. 10, pp. 149-158, 2018.
[13] R. A. Melchor et al., “CT-152: Application of Web-Scraping Techniques for Autonomous Massive Retrieval of Hematologic Patients' Information During SARS-CoV2 Pandemic,” Clinical Lymphoma Myeloma and Leukemia, vol. 20, pp. 214, 2020.
[14] M. F. C. Portugal et al., “Epidemiological Analysis of 5,595 Procedures of Endovascular Correction of Isolated Descending Thoracic Aortic Disease Over 12 Years in the Public Health System in Brazil,” Clinics, vol. 76, 2021.
[15] M. J. Lee, J. Kang, K. Hreha and M. Pappadis, “A Novel Web Scraping Approach to Identify Stroke Outcome Measures: A Feasibility Study,” Archives of Physical Medicine and Rehabilitation, vol. 103(3), pp. 30, 2022.
[16] S. Mohan, A. K. Solanki, H. K. Taluja and A. Singh, “Predicting the impact of the third wave of COVID-19 in India using hybrid statistical machine learning models: A time series forecasting and sentiment analysis approach,” Computers in Biology and Medicine, vol. 144, no. 105354, 2022.
[17] L. Cui, Z. Jiang, X. Huang, S. Liu, Y. Wu and M. Fan, “Decade changes of the food web structure in tropical seagrass meadow: Implication of eutrophication effects,” Marine pollution bulletin, vol. 173, no. 113122, 2021.
[18] Q. Wang, S. Fu, F. Mu, Z. Zhang and X. Liu, “Bottom aquaculture can improve the basic trophic pathways and enhance the secondary production: Implications from benthic food web analysis,” Marine Pollution Bulletin, vol. 177, no. 113562, 2022.
[19] C. Muehlethaler and R. Albert, “Collecting data on textiles from the internet using web crawling and web scraping tools,” Forensic Science International, vol. 322, no. 110753, 2021.
[20] M. Klasson, C. Zhang and H. Kjellström, “Using Variational Multi-view Learning for Classification of Grocery Items,” Patterns, vol. 1(8), no. 100143, 2020.
[21] M. Kiran and N. Mownika, “Machine learning integrated emotions detection on lockdowns in India using advanced web scraping,” Materials Today: Proceedings, 2021.
[22] J. Maybir and B. Chapman, “Web scraping of ecstasy user reports as a novel tool for detecting drug market trends,” Forensic Science International: Digital Investigation, vol. 37, no. 301172, 2021.
[23] J. Schedlbauer, G. Raptis and B. Ludwig, “Medical informatics labor market analysis using web crawling, web scraping, and text mining,” International Journal of Medical Informatics, vol. 150, no. 104453, 2021.
[24] L. Ricci et al., “Web-based and machine learning approaches for identification of patient-reported outcomes in inflammatory bowel disease,” Digestive and Liver Disease, vol. 54(4), pp. 483-489, 2022.
[25] T. Alipourfard, H. Arefi and S. Mahmoudi, “A Novel Deep Learning Framework by Combination of Subspace-Based Feature Extraction and Convolutional Neural Networks for Hyperspectral Images Classification,” IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium, 2018, pp. 4780-4783.
[26] T. Hayit, H. Erbay, F. Varçın, F. Hayit and N. Akci, “Determination of the severity level of yellow rust disease in wheat by using convolutional neural networks,” Journal of Plant Pathology, vol. 103(3), pp. 923-934, 2021.
[27] W. Guo, G. Xu, B. Liu and Y. Wang, “Hyperspectral Image Classification Using CNN-Enhanced Multi-Level Haar Wavelet Features Fusion Network,” IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1-5, 2022.
[28] P. Aggarwal, N. K. Mishra, B. Fatimah, P. Singh, A. Gupta and S. D. Joshi, “COVID-19 image classification using deep learning: Advances, challenges and opportunities,” Computers in Biology and Medicine, no. 105350, 2022.
[29] T. Hayıt ve G. Çınarer, “X-RAY görüntülerini kullanarak GLCM ve derin özniteliklerin birleşimine dayalı Covid-19 sınıflandırılması,” İnönü Üniversitesi Sağlık Hizmetleri Meslek Yüksek Okulu Dergisi, c. 10 (1), ss. 313-325, 2022
[30] A. Krizhevsky, I. Sutskever and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, 2012.
[31] M. Hussain, J. J. Bird and D. R. Faria, “A study on cnn transfer learning for image classification,” in UK Workshop on computational Intelligence, Springer, Cham ,2018, pp. 191-202.
[32] Anonimous. The Selenium Browser Automation Project. (Mar. 16, 2022). Accessed: Apr. 12, 2022. [Online]. Available: https://www.selenium.dev/documentation/
[33] Anonimous. ChromeDriver. Accessed: Apr. 12, 2022. [Online]. Available: https://chromedriver.chromium.org/home
[34] S. P. Mohanty, D. P. Hughes and M. Salathé, “Using deep learning for image-based plant disease detection,” Frontiers in plant science, vol. 7, no. 1419, 2016.
[35] A. G. Howard et al., “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
[36] K. He, X. Zhang, S. Ren and J. Sun, “Deep residual learning for image recognition,” IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[37] C. Szegedy et al., “Going deeper with convolutions,” IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9.
[38] G. Huang, Z. Liu, L. Van Der Maaten and K. Q. Weinberger, “Densely connected convolutional networks,” IEEE conference on computer vision and pattern recognition, 2017, pp. 4700-4708.