Improving word embeddings projection for Turkish hypernym extraction

Improving word embeddings projection for Turkish hypernym extraction

Corpus-driven approaches can automatically explore is-a relations between the word pairs from corpus. Thisproblem is also called hypernym extraction. Formerly, lexico-syntactic patterns have been used to solve hypernymrelations. The language-specific syntactic rules have been manually crafted to build the patterns. On the other hand,recent studies have applied distributional approaches to word semantics. They extracted the semantic relations relyingon the idea that similar words share similar contexts. Former distributional approaches have applied one-hot bag-ofword (BOW) encoding. The dimensionality problem of BOW has been solved by various neural network approaches,which represent words in very short and dense vectors, or word embeddings. In this study, we used word embeddingsrepresentation and employed the optimized projection algorithm to solve the hypernym problem. The supervisedarchitecture learns a mapping function so that the embeddings (or vectors) of word pairs that are in hypernym relationscan be projected to each other. In the training phase, the architecture first learns the embeddings of words and theprojection function from a list of word pairs. In the test phase, the projection function maps the embeddings of a givenword to a point that is the closest to its hypernym. We utilized the deep learning optimization methods to optimizethe model and improve the performances by tuning hyperparameters. We discussed our results by carrying out manyexperiments based on cross-validation. We also addressed problem-specific loss function, monitored hyperparameters, andevaluated the results with respect to different settings. Finally, we successfully showed that our approach outperformedbaseline functions and other studies in the Turkish language.

___

  • [1] Hearst MA. Automatic Acquisition of Hyponyms from Large Text Corpora. In: ACL Conference on Computational Linguistics; Stroudsburg, PA, USA; 1992. pp. 539-545.
  • [2] Kotlerman L, Dagan I, Szpektor I, Zhitomirsky-Geffet M. Directional distributional similarity for lexical inference. Natural Language Engineering 2010; 16 (4): 359-389. doi: 10.1017/S1351324910000124
  • [3] Santus E, Lenci A, Lu Q, Schulte S. Chasing hypernyms in vector spaces with entropy. In: 14th Conference of the European Chapter of the Association for Computational Linguistics; Gothenburg, Sweden; 2014. pp. 38-34.
  • [4] Yu Z, Wang H, Lin X, Wang M. Learning term embeddings for hypernymy identification. In: 24th International Conference on Artificial Intelligence; Buenos Aires, Argentina; 2015. pp. 1390-1397.
  • [5] Turian J, Ratinov L, Bengio Y. Word representations: a simple and general method for semi-supervised learning. In: 48th Annual Meeting of the Association for Computational Linguistics; Stroudsburg, PA, USA; 2010. pp. 384-394.
  • [6] Pennington J, Socher R, Manning C. Glove: Global vectors for word representation. In: ACL Conference on Empirical Methods in Natural Language Processing; Doha, Qatar; 2014. pp. 1532-1543.
  • [7] Mikolov T, Yih W, Zweig G. Linguistic regularities in continuous space word representations. In: Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics; Atlanta, GA, USA; 2013. pp. 746-751.
  • [8] Rothe S, Ebert S, Schütze H. Ultradense word embeddings by orthogonal transformation. In: 2016 Conference of the North American Chapter of the ACL; San Diego, CA, USA; 2016. pp. 767-777.
  • [9] Sutskever I, Martens J, Dahl G, Hinton G. On the importance of initialization and momentum in deep learning. In: 30th International Conference on Machine Learning; Atlanta, GA, USA; 2013. pp. 1139-1147.
  • [10] Goodfellow I, Bengio Y, Courville A. Deep Learning. Cambridge, MA, USA: MIT Press, 2016.
  • [11] Nesterov Y. A method of solving a convex programming problem with convergence rate. Proceedings of the USSR Academy of Sciences 1983; 269 (3): 543-547.
  • [12] Dozat T. Incorporating Nesterov momentum into Adam. In: International Conference on Learning Representations Workshop; San Juan, Puerto Rico; 2016. pp. 1-4.
  • [13] Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 2011; 12 (1): 2121–2159.
  • [14] Tieleman T, Hinton G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. In: Tieleman T (editor). Neural Networks for Machine Learning. Toronto, Canada: University Of Toronto Technical Reports, 2012.
  • [15] Kingma D, Ba J. Adam: A method for stochastic optimization. In: International Conference on Learning Representations; San Diego, CA, USA; 2015. pp. 1-13.
  • [16] Bishop CM. Pattern Recognition and Machine Learning. New York City, NY, USA: Springer-Verlag, 2011.
  • [17] Grave E, Bojanowski P, Gupta P, Joulin A, Mikolov T. Learning word vectors for 157 languages. In: International Conference on Language Resources and Evaluation; Miyazaki, Japan; 2018. pp. 3483-3487.
  • [18] Collobert R, Weston J. A unified architecture for natural language processing: deep neural networks with multitask learning. In: 25th International Conference on Machine Learning; New York City, NY, USA; 2008. pp. 160-167.
  • [19] Guo J, Che W, Wang H, Liu T. Revisiting embedding features for simple semi-supervised learning. In: Conference on Empirical Methods in Natural Language Processing; Doha, Qatar; 2014. pp. 110-120.
  • [20] Pembeci İ. Using word embeddings for ontology enrichment. International Journal of Intelligent Systems and Applications in Engineering 2016; 4 (3): 49-56. doi: 10.18201/ijisae.58806
  • [21] Maaten LJP. Accelerating t-SNE using tree-based algorithms. Journal of Machine Learning Research 2014; 15 (1): 3221-3245.
  • [22] Yildirim S, Yildiz T. Learning Turkish hypernymy using word embeddings. International Journal of Computational Intelligence Systems 2018; 11 (1): 371-383. doi: 10.2991/ijcis.11.1.28
  • [23] Yildiz T, Yildirim S, Diri B. Acquisition of Turkish meronym based on classification of patterns. Pattern Analysis and Applications 2016; 19 (2): 495-507. doi: 10.1007/s10044-015-0516-9
Turkish Journal of Electrical Engineering and Computer Sciences-Cover
  • ISSN: 1300-0632
  • Yayın Aralığı: 6
  • Yayıncı: TÜBİTAK
Sayıdaki Diğer Makaleler

A transmission optimization algorithm for smart load controllers

Ohyoung SONG

A Fine-grain and scalable set-based cache partitioning through thread classification

İsa Ahmet GÜNEY, Gürhan KÜÇÜK

Sparse Bayesian approach to fast learning network for multiclassification

Haoran LIU, Zhibiao ZHAO, Chao WU, Bin LIU

An improved space charge distribution analytical model to assess field-effect transistor’s intrinsic capacitors

Saif-ur REHMAN, Umer Farooq AHMED, Muhammad Mansoor AHMED, Umair RAFIQUE

A crowdsensing-based framework for urban air quality decision support

Hasna EL ALAOUI EL ABDALLAOUI, Abdelaziz EL FAZZIKI, Mohamed SADGAL, Fatima Zohra ENNAJI, Jamal OUARZAZI

A two-stage power converter architecture with maximum power extraction for low-power energy sources

Ridvan UMAZ

On the performance of quick artificial bee colony algorithm for dynamic deployment of wireless sensor networks

Zahraa AL-DULAIMI, Beyza GÖRKEMLİ

Comparative evaluation of a-b-c and stationary frame of reference for permanent magnet brushless DC motor drive applied for generation of switching pattern

Ritesh KESHRI, Manish TRIVEDI

Empirical model development for the estimation of clearness index using meteorological parameters

Shafiqur REHMAN, Intikhab ULFAT, Fakhar-e ALAM, M. JAHANGIR, Saif-ur REHMAN, Muhammad SHOAIB, Imran SIDDIQUI

A novel initial rotor position alignment method for permanent magnet synchronous motor using incremental encoder

Sinan YILMAZ, Mehmet Timur AYDEMİR