Improving word embeddings projection for Turkish hypernym extraction

Corpus-driven approaches can automatically explore is-a relations between the word pairs from corpus. Thisproblem is also called hypernym extraction. Formerly, lexico-syntactic patterns have been used to solve hypernymrelations. The language-specific syntactic rules have been manually crafted to build the patterns. On the other hand,recent studies have applied distributional approaches to word semantics. They extracted the semantic relations relyingon the idea that similar words share similar contexts. Former distributional approaches have applied one-hot bag-ofword (BOW) encoding. The dimensionality problem of BOW has been solved by various neural network approaches,which represent words in very short and dense vectors, or word embeddings. In this study, we used word embeddingsrepresentation and employed the optimized projection algorithm to solve the hypernym problem. The supervisedarchitecture learns a mapping function so that the embeddings (or vectors) of word pairs that are in hypernym relationscan be projected to each other. In the training phase, the architecture first learns the embeddings of words and theprojection function from a list of word pairs. In the test phase, the projection function maps the embeddings of a givenword to a point that is the closest to its hypernym. We utilized the deep learning optimization methods to optimizethe model and improve the performances by tuning hyperparameters. We discussed our results by carrying out manyexperiments based on cross-validation. We also addressed problem-specific loss function, monitored hyperparameters, andevaluated the results with respect to different settings. Finally, we successfully showed that our approach outperformedbaseline functions and other studies in the Turkish language.

PDF

___

[1] Hearst MA. Automatic Acquisition of Hyponyms from Large Text Corpora. In: ACL Conference on Computational Linguistics; Stroudsburg, PA, USA; 1992. pp. 539-545.
[2] Kotlerman L, Dagan I, Szpektor I, Zhitomirsky-Geffet M. Directional distributional similarity for lexical inference. Natural Language Engineering 2010; 16 (4): 359-389. doi: 10.1017/S1351324910000124
[3] Santus E, Lenci A, Lu Q, Schulte S. Chasing hypernyms in vector spaces with entropy. In: 14th Conference of the European Chapter of the Association for Computational Linguistics; Gothenburg, Sweden; 2014. pp. 38-34.
[4] Yu Z, Wang H, Lin X, Wang M. Learning term embeddings for hypernymy identification. In: 24th International Conference on Artificial Intelligence; Buenos Aires, Argentina; 2015. pp. 1390-1397.
[5] Turian J, Ratinov L, Bengio Y. Word representations: a simple and general method for semi-supervised learning. In: 48th Annual Meeting of the Association for Computational Linguistics; Stroudsburg, PA, USA; 2010. pp. 384-394.
[6] Pennington J, Socher R, Manning C. Glove: Global vectors for word representation. In: ACL Conference on Empirical Methods in Natural Language Processing; Doha, Qatar; 2014. pp. 1532-1543.
[7] Mikolov T, Yih W, Zweig G. Linguistic regularities in continuous space word representations. In: Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics; Atlanta, GA, USA; 2013. pp. 746-751.
[8] Rothe S, Ebert S, Schütze H. Ultradense word embeddings by orthogonal transformation. In: 2016 Conference of the North American Chapter of the ACL; San Diego, CA, USA; 2016. pp. 767-777.
[9] Sutskever I, Martens J, Dahl G, Hinton G. On the importance of initialization and momentum in deep learning. In: 30th International Conference on Machine Learning; Atlanta, GA, USA; 2013. pp. 1139-1147.
[10] Goodfellow I, Bengio Y, Courville A. Deep Learning. Cambridge, MA, USA: MIT Press, 2016.
[11] Nesterov Y. A method of solving a convex programming problem with convergence rate. Proceedings of the USSR Academy of Sciences 1983; 269 (3): 543-547.
[12] Dozat T. Incorporating Nesterov momentum into Adam. In: International Conference on Learning Representations Workshop; San Juan, Puerto Rico; 2016. pp. 1-4.
[13] Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 2011; 12 (1): 2121–2159.
[14] Tieleman T, Hinton G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. In: Tieleman T (editor). Neural Networks for Machine Learning. Toronto, Canada: University Of Toronto Technical Reports, 2012.
[15] Kingma D, Ba J. Adam: A method for stochastic optimization. In: International Conference on Learning Representations; San Diego, CA, USA; 2015. pp. 1-13.
[16] Bishop CM. Pattern Recognition and Machine Learning. New York City, NY, USA: Springer-Verlag, 2011.
[17] Grave E, Bojanowski P, Gupta P, Joulin A, Mikolov T. Learning word vectors for 157 languages. In: International Conference on Language Resources and Evaluation; Miyazaki, Japan; 2018. pp. 3483-3487.
[18] Collobert R, Weston J. A unified architecture for natural language processing: deep neural networks with multitask learning. In: 25th International Conference on Machine Learning; New York City, NY, USA; 2008. pp. 160-167.
[19] Guo J, Che W, Wang H, Liu T. Revisiting embedding features for simple semi-supervised learning. In: Conference on Empirical Methods in Natural Language Processing; Doha, Qatar; 2014. pp. 110-120.
[20] Pembeci İ. Using word embeddings for ontology enrichment. International Journal of Intelligent Systems and Applications in Engineering 2016; 4 (3): 49-56. doi: 10.18201/ijisae.58806
[21] Maaten LJP. Accelerating t-SNE using tree-based algorithms. Journal of Machine Learning Research 2014; 15 (1): 3221-3245.
[22] Yildirim S, Yildiz T. Learning Turkish hypernymy using word embeddings. International Journal of Computational Intelligence Systems 2018; 11 (1): 371-383. doi: 10.2991/ijcis.11.1.28
[23] Yildiz T, Yildirim S, Diri B. Acquisition of Turkish meronym based on classification of patterns. Pattern Analysis and Applications 2016; 19 (2): 495-507. doi: 10.1007/s10044-015-0516-9