Improving word embeddings projection for Turkish hypernym extraction

Improving word embeddings projection for Turkish hypernym extraction

Corpus-driven approaches can automatically explore is-a relations between the word pairs from corpus. Thisproblem is also called hypernym extraction. Formerly, lexico-syntactic patterns have been used to solve hypernymrelations. The language-specific syntactic rules have been manually crafted to build the patterns. On the other hand,recent studies have applied distributional approaches to word semantics. They extracted the semantic relations relyingon the idea that similar words share similar contexts. Former distributional approaches have applied one-hot bag-ofword (BOW) encoding. The dimensionality problem of BOW has been solved by various neural network approaches,which represent words in very short and dense vectors, or word embeddings. In this study, we used word embeddingsrepresentation and employed the optimized projection algorithm to solve the hypernym problem. The supervisedarchitecture learns a mapping function so that the embeddings (or vectors) of word pairs that are in hypernym relationscan be projected to each other. In the training phase, the architecture first learns the embeddings of words and theprojection function from a list of word pairs. In the test phase, the projection function maps the embeddings of a givenword to a point that is the closest to its hypernym. We utilized the deep learning optimization methods to optimizethe model and improve the performances by tuning hyperparameters. We discussed our results by carrying out manyexperiments based on cross-validation. We also addressed problem-specific loss function, monitored hyperparameters, andevaluated the results with respect to different settings. Finally, we successfully showed that our approach outperformedbaseline functions and other studies in the Turkish language.

___

  • [1] Hearst MA. Automatic Acquisition of Hyponyms from Large Text Corpora. In: ACL Conference on Computational Linguistics; Stroudsburg, PA, USA; 1992. pp. 539-545.
  • [2] Kotlerman L, Dagan I, Szpektor I, Zhitomirsky-Geffet M. Directional distributional similarity for lexical inference. Natural Language Engineering 2010; 16 (4): 359-389. doi: 10.1017/S1351324910000124
  • [3] Santus E, Lenci A, Lu Q, Schulte S. Chasing hypernyms in vector spaces with entropy. In: 14th Conference of the European Chapter of the Association for Computational Linguistics; Gothenburg, Sweden; 2014. pp. 38-34.
  • [4] Yu Z, Wang H, Lin X, Wang M. Learning term embeddings for hypernymy identification. In: 24th International Conference on Artificial Intelligence; Buenos Aires, Argentina; 2015. pp. 1390-1397.
  • [5] Turian J, Ratinov L, Bengio Y. Word representations: a simple and general method for semi-supervised learning. In: 48th Annual Meeting of the Association for Computational Linguistics; Stroudsburg, PA, USA; 2010. pp. 384-394.
  • [6] Pennington J, Socher R, Manning C. Glove: Global vectors for word representation. In: ACL Conference on Empirical Methods in Natural Language Processing; Doha, Qatar; 2014. pp. 1532-1543.
  • [7] Mikolov T, Yih W, Zweig G. Linguistic regularities in continuous space word representations. In: Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics; Atlanta, GA, USA; 2013. pp. 746-751.
  • [8] Rothe S, Ebert S, Schütze H. Ultradense word embeddings by orthogonal transformation. In: 2016 Conference of the North American Chapter of the ACL; San Diego, CA, USA; 2016. pp. 767-777.
  • [9] Sutskever I, Martens J, Dahl G, Hinton G. On the importance of initialization and momentum in deep learning. In: 30th International Conference on Machine Learning; Atlanta, GA, USA; 2013. pp. 1139-1147.
  • [10] Goodfellow I, Bengio Y, Courville A. Deep Learning. Cambridge, MA, USA: MIT Press, 2016.
  • [11] Nesterov Y. A method of solving a convex programming problem with convergence rate. Proceedings of the USSR Academy of Sciences 1983; 269 (3): 543-547.
  • [12] Dozat T. Incorporating Nesterov momentum into Adam. In: International Conference on Learning Representations Workshop; San Juan, Puerto Rico; 2016. pp. 1-4.
  • [13] Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 2011; 12 (1): 2121–2159.
  • [14] Tieleman T, Hinton G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. In: Tieleman T (editor). Neural Networks for Machine Learning. Toronto, Canada: University Of Toronto Technical Reports, 2012.
  • [15] Kingma D, Ba J. Adam: A method for stochastic optimization. In: International Conference on Learning Representations; San Diego, CA, USA; 2015. pp. 1-13.
  • [16] Bishop CM. Pattern Recognition and Machine Learning. New York City, NY, USA: Springer-Verlag, 2011.
  • [17] Grave E, Bojanowski P, Gupta P, Joulin A, Mikolov T. Learning word vectors for 157 languages. In: International Conference on Language Resources and Evaluation; Miyazaki, Japan; 2018. pp. 3483-3487.
  • [18] Collobert R, Weston J. A unified architecture for natural language processing: deep neural networks with multitask learning. In: 25th International Conference on Machine Learning; New York City, NY, USA; 2008. pp. 160-167.
  • [19] Guo J, Che W, Wang H, Liu T. Revisiting embedding features for simple semi-supervised learning. In: Conference on Empirical Methods in Natural Language Processing; Doha, Qatar; 2014. pp. 110-120.
  • [20] Pembeci İ. Using word embeddings for ontology enrichment. International Journal of Intelligent Systems and Applications in Engineering 2016; 4 (3): 49-56. doi: 10.18201/ijisae.58806
  • [21] Maaten LJP. Accelerating t-SNE using tree-based algorithms. Journal of Machine Learning Research 2014; 15 (1): 3221-3245.
  • [22] Yildirim S, Yildiz T. Learning Turkish hypernymy using word embeddings. International Journal of Computational Intelligence Systems 2018; 11 (1): 371-383. doi: 10.2991/ijcis.11.1.28
  • [23] Yildiz T, Yildirim S, Diri B. Acquisition of Turkish meronym based on classification of patterns. Pattern Analysis and Applications 2016; 19 (2): 495-507. doi: 10.1007/s10044-015-0516-9
Turkish Journal of Electrical Engineering and Computer Sciences-Cover
  • ISSN: 1300-0632
  • Yayın Aralığı: Yılda 6 Sayı
  • Yayıncı: TÜBİTAK
Sayıdaki Diğer Makaleler

Performance evaluation of WebRTC-based online consultation platform

E. Alperay TARIM, H. Cumhur TEKİN

A novel method based on comparison using threshold scale for CFAR detectors under environments with conditions of electromagnetic interference

Pouria SALEHI, Naser PARHIZGAR, Milad DANESHVAR

An automated snick detection and classification scheme as a cricket decision review system

Aftab KHAN, Syed Qadir HUSSAIN, Muhammad WALEED, Umair KHAN, Ashfaq KHAN

Modified recycling folded cascode OTA with enhancement in transconductance and output impedance

Kumaravel SUNDARAM, Sudheer Raja VENISHETTY

The effect of snowfall and icing on the sustainability of the power output of a grid-connected photovoltaic system in Konya, Turkey

Erman ERHAN, Mehmet Ali ANADOL

Automatic prostate segmentation using multiobjective active appearance model in MR images

Mohamad Ali POURMINA, Mohammad-Shahram MOIN, Ahad SALIMI

An optimized harmonic elimination method based on synchronized microcontroller architecture

Meenu NAIR, Vivek GOPINATH, Jayanta BISWAS, Mukti BARAI

Line independency-based network modelling for backward/forward load flow analysis of electrical power distribution systems

Reyhaneh TAHERI, Mohammad Hossein REZAEIAN KOOCHI, Alimorad KHAJEZADEH, Abbas SHARIFI NASAB ANARI

Empirical model development for the estimation of clearness index using meteorological parameters

Shafiqur REHMAN, Intikhab ULFAT, Fakhar-e ALAM, M. JAHANGIR, Saif-ur REHMAN, Muhammad SHOAIB, Imran SIDDIQUI

Comparative evaluation of a-b-c and stationary frame of reference for permanent magnet brushless DC motor drive applied for generation of switching pattern

Ritesh KESHRI, Manish TRIVEDI