Doğal dil işleme yöntemleriyle metinden SQL sorgusu tahmini üzerine bir çalışma

Bilgisayar bilimlerinde, kalıcı bilgiler sistematik bir şekilde veri tabanlarında tutulmaktadır. Veri tabanlarındaki bilgilere ulaşılabilmesi için belirli teknik birikimlere sahip olunması gerekmektedir. Çalışmamızda, doğal dille yapılmış sorgulardan SQL sorgusu tahmini yapılmıştır. Bu sayede teknik bir bilgi birikimi olmadan veri tabanları üzerinde sorgulamalar yapılabilmesi hedeflenmektedir. Çalışmamızda doğal dil işleme tekniklerinden faydalanmıştır. Doğal dil işlemenin ana konularından biri olan çoklu dil desteği, uygulamaya entegre edilmiştir. Doğal dilden SQL tahmini için uygulanan model, LSTM ağı kullanılarak spider veri setiyle eğitilmiştir. Yapılan SQL sorgusu tahminlerinde, %75 başarı oranına ulaşmıştır. Çoklu dil desteğiyle yapılan genel sistem değerlendirmesinde, başarı oranı %69.4’ e ulaşmıştır.

A study on text-to-SQL query prediction with natural language processing methods

In computer science, permanent information is kept in databases in a systematic way. In order to access the information in the databases, it is necessary to have certain technical knowledge. In our study, SQL query estimation was made from natural language queries. In this way, it is aimed to make queries on databases without any technical knowledge. In our study, natural language processing techniques were used. Multi-language support, which is one of the main subjects of natural language processing, is integrated into the application. The model applied for natural language SQL prediction is trained with the spider dataset using the LSTM network. In the SQL query estimations, it has reached 75% success rate. In the overall system evaluation with multilingual support, the success rate has reached 69.4%.

___

  • E. Adalı, Doğal dil işleme. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, 5 (2), 2012.
  • R. C. Moore, Practical natural-language processing by computer. SRI Internatıonal Menlo Park CA Artıfıcıal Intellıgence Center, 1981.
  • Y. Yu, X. Si, C. Hu, and J. Zhang, A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures, Neural Computation, 31 (7), 1235–1270, 2019. https://doi.org/10.1162/neco_a_0 1199.
  • T. Yu et al., Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task. arXiv preprint arXiv:1809.08887, 2018.
  • O. Rubin and J. Berant, SmBoP: Semi-autoregressive Bottom-up Semantic Parsingi arXiv preprint arXiv:2010.12412, 2020.
  • Spider: Yale Semantic Parsing and Text-to-SQL Challenge. https://yale-lily.github.io/spider (acces sed May 11, 2022).
  • J. F. Allen, Natural Language Processing, Encyclopedia of Computer Science. John Wiley and Sons Ltd., Londra, 2003.
  • B. John and Z. Michael, Natural Language Generation. Oxford University Press, Oxford, 2012.
  • T. Alan M, Computing machinery and intelligence, in Parsing the turing test. Springer, Dordrecht, 2009.
  • M. A. Chéragui, Theoretical overview of machine translation. International Conference on Web and Information Technologies, pp. 160-169, Sidi Bel-Abbes, Algeria, 2012.
  • D. H. Klatt, Review of the ARPA speech understanding Project. The Journal of the Acoustical Society of America, 62 (6), 1345-1366, 1977.
  • M. Benzeghiba et al., Automatic speech recognition and speech variability: A review. Speech communication, 49(10-11), 763-786, 2007. https://doi.org/10.1016/j.specom.2007.02.006.
  • K. Frank, Gradience in grammar: Experimental and computational aspects of degrees of grammaticality. Ph.D. Thesis, Rutgers University, New Jersey, USA, 2000.
  • C. David, Lexical acquisition in the core language engine. Fourth Conference of the European Chapter of the Association for Computational Linguistics, 1989.
  • J. Karen Sparc, Natural language processing: a historical review. Linguistica Computazionale, Springer, Dordrecht, 1994.
  • K. Nigam, M. Andrew, T. Sebastian, and M. Tom, Learning to classify text from labeled and unlabeled documents. Proceedings of the Fifteenth National/Tenth Conference on Artificial Intelli gence/Innovative Applications of Artificial Intelligence, pp. 792-799, 1998.
  • W. Casey, H. Ben, C. Grace Y, and E. Gerard, Using the web for language ındependent spellchecking and autocorrection. Conference on Empirical Methods in Natural Language Processing, pp. 890-899, Singapore, 2009.
  • W. Xiong, L. Wu, F. Alleva, J. Droppo, X. Huang, and A. Stolcke, The microsoft 2017 conversational speech recognition system. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5934-5938, 2018. https://doi.org/ 10.1109/ICASSP.2018.8461870.
  • W. Xiong et al., Toward human parity in conversational speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(12), 2410-2423, 2017. https://doi.org/10.1109/TASLP. 2017.2756440.
  • M. T. Qadri and M. Asif, Automatic number plate recognition system for vehicle ıdentification using optical character recognition, in 2009 International Conference on Education Technology and Computer, pp. 335–338, 2009. https://doi.org/ 10.1109/ICETC.2009.54.
  • H. Han et al., Faster Re-translation Using Non-Autoregressive Model For Simultaneous Neural Machine Translation. arXiv preprint arXiv: 2012.14681, 2020.
  • G. Alfio, B. Or, P. Siddharth, and M. Kathlees, Semantic technologies in IBM Watson. Fourth Workshop on Teaching NLP and CL, pp. 85–92, Sofia, Bulgaria, 2013.
  • I. Aduriz, M. Urkia, I. Alegria, X. Artola, N. Ezeiza, and K. Sarasola, A spelling corrector for Basque based on morphology. Literary and Linguistic Computing, 12 (1), 31-38, 1997. https://doi.org /10.1093/llc/12.1.31.
  • D. Simla, Bulanık mantık ve yapay sinir ağları ile Türkçe yazım denetleyicisi. Doktora Tezi, İstanbul Teknik Üniversitesi Fen Bilimleri Enstitüsü, Türkiye, 2005.
  • T. Katrin, W. Joachim, and H. Udo, Sentence and token splitting based on conditional random fields. 10th Conference of the Pacific Association for Computational Linguistics, p.57, Melbourne, Australia, 2007.
  • R. Ricardo, O. Hugo Gonçalo, and G. Paulo, NLPPort: A Pipeline for Portuguese NLP. 7th Symposium on Languages, Applications and Technologies, Portugal, 2018.
  • P. McNamee and J. Mayfield, Character n-gram tokenization for European language text retrieval. Information retrieval, 7(1), 73-97, 2004. https://doi.org/10.1023/B:INRT.0000009441.78971.be.
  • A. Voutilainen, Part-of-speech tagging. The Oxford handbook of computational linguistics, Oxford, 2003.
  • T. Kristina, K. Dan, M. Christopher D, and S. Yoram, Feature-rich part-of-speech tagging with a cyclic dependency network. Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pp. 252-259, 2003.
  • J. Anjali Ganesh, A comparative study of stemming algorithms. International Journal of Computer Applications in Technology, 2 (6), 1930–1938, 2011.
  • Z. A. Güven and M. O. Unalır, Önerilen varlık ismi tanıma yöntemi ile soru cevaplamada bert modelinin iyileştirilmesi. 6th International Conference on Computer Science and Engineering, 2021.
  • B. Mohit, Theory and Applications of Natural Language Processing. Springer, 2014. https://doi. org/ 10.1007/978-3-642-45358-8_7.
  • P.-H. Chen, Essential Elements of Natural Language Processing: What the Radiologist Should Know. Academic Radiology, 27 (1), 6–12, 2020. https://doi.org/10.1016/j.acra.2019.08.010.
  • J. K. and J. R., Stop-Word Removal Algorithm and its Implementation for Sanskrit Language. International Journal of Computer Applications, 150 (2), 15–17, 2016. https://doi.org/10.5120/ijca20169 11462.
  • E. S. Akgül, C. Ertano, and B. Diri, Sentiment analysis with Twitter. Pamukkale University Journal of Engineering Sciences, 22 (2), 106–110, 2016. https://doi.org/10.5505/pajes.2015.37268.
  • M. Abdel-Nasser and K. Mahmoud, Accurate photovoltaic power forecasting models using deep LSTM-RNN. Neural Computing and Applications, 31 (7), 2727–2740, 2019. https://doi.org/10.1007/ s00521-017-3225-z.
  • A. Sherstinsky, Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, 132306, 2020. https://doi.org/10.1016/j.physd .2019.132306.
  • A. C. Mendes and C. Antunes, Pattern mining with natural language processing: An exploratory approach. International Workshop on Machine Learning and Data Mining in Pattern Recognition, Berlin, Germany, 2009. https://doi.org/10.1007/978-3-642-03070-3_20.
  • A. H. Miller et al., ParlAI: A Dialog Research Software Platform. arXiv preprint arXiv:1705. 06476, 2017.
  • M. Ott et al., fairseq: A Fast, Extensible Toolkit for Sequence Modeling. arXiv preprint arXiv:1904. 01038, 2019.
  • M. Gardner et al., AllenNLP: A Deep Semantic Natural Language Processing Platform. arXiv preprint arXiv:1803.07640, 2018.
  • V. Zhong, C. Xiong, and R. Socher, Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning. arXiv preprint arXiv:1709.00103, 2017.
  • Y. Cai and X. Wan, IGSQL: Database Schema Interaction Graph Based Neural Model for Context-Dependent Text-to-SQL Generation. arXiv preprint arXiv:2011.05744, 2020.
  • G. Tur, D. Hakkani-Tur, and L. Heck, What is left to be understood in ATIS?. 2010 IEEE Spoken Language Technology Workshop, pp. 19–24, 2010. https://doi.org/10.1109/SLT.2010.5700816.
  • T. Yu, Z. Li, Z. Zhang, R. Zhang, and D. Radev, TypeSQL: Knowledge-based Type-Aware Neural Text-to-SQL Generation. arXiv preprint arXiv :1804.09769, 2018.
  • B. Wang, R. Shin, X. Liu, O. Polozov, and M. Richardson, RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers. arXiv preprint arXiv:1911.04942, 2019.
  • A. T. Nguyen, M. H. Dao, and D. Q. Nguyen, A Pilot Study of Text-to-SQL Semantic Parsing for Vietnamese. arXiv preprint arXiv:2010.01891, 2020.
  • T. Shi, K. Tatwawadi, K. Chakrabarti, Y. Mao, O. Polozov, and W. Chen, IncSQL: Training Incremental Text-to-SQL Parsers with Non-Deterministic Oracles. arXiv preprint arXiv:1809.05054, 2018.
  • J. M. Zelle and R. J. Mooney, Learning to parse database queries using inductive logic programming. Proceedings of the national conference on artificial intelligence, pp. 1050-1055, Oregan, USA, 1996.
  • A. Aghajanyan et al., Conversational Semantic Parsing. arXiv preprint arXiv:2009.13655, 2020.
Niğde Ömer Halisdemir Üniversitesi Mühendislik Bilimleri Dergisi-Cover
  • ISSN: 2564-6605
  • Yayın Aralığı: Yılda 4 Sayı
  • Başlangıç: 2017
  • Yayıncı: Niğde Ömer Halisdemir Üniversitesi
Sayıdaki Diğer Makaleler

Kapadokya bölgesindeki Fertek ve Hançerli Kiliselerinin malzeme bozulmalarının ve sismik davranışlarının incelenmesi

Metin Hakan SEVERCAN, Emel EFE YAVAŞCAN, Semiha AKÇAÖZOĞLU, Kubilay AKÇAÖZOĞLU

Periyodik yığılı kütleli kirişlerde eğilme dalgalarının yayılımı

Aydın ÖZMUTLU

Süperkapasitör performansını artırmak için grafitik karbon nitrür /grafen hibrit yapılarının kullanılması

Buse SERT, Ersan HARPUTLU

Pırlanta fiyat tahmini için regresyon modellerinin karşılaştırmalı analizi

Merve ASİL, Gülfem IŞIKLAR ALPTEKİN

Bir lojistik işletmesinde tesis depo performansının Copras ve Topsis yöntemleriyle değerlendirilmesi

Merve ER, Selen AVCI AZKESKİN, Zerrin ALADAĞ

Doğal dil işleme yöntemleriyle metinden SQL sorgusu tahmini üzerine bir çalışma

Asım Sinan YÜKSEL, Muhammed Abdulhamid KARABIYIK

Lambert probleminin modifiye Chebyshev-Picard yineleme yöntemini kullanarak paralel çözümü

Majd AJROUDİ, F. Şükrü TORUN

Tek ve çift katmanlı yapay damar konfigürasyonlarının Holzapfel-Gasser-Ogden hiperelastik modeli ile mekanik uyumluluk analizi

Galip YILMAZ, Emin USLU

Türkiye madencilik sektöründe döngüsel ekonomi ve dijitalleşme uygulamaları

Mahmut Suat DELİBALTA

Kolon enkesitinin ve beton basınç dayanımının eksenel yüklere ve sünmeden dolayı oluşan yüklere maruz kalan betonarme kolonlarda dayanıma etkisi

Abdulhamit NAKİPOĞLU, Mohammed Gamal AL-HAGRI, Mahmud Sami DÖNDÜREN