Turkish entity discovery with word embeddings
Turkish entity discovery with word embeddings
Entity-linking systems link noun phrase mentions in a text to their corresponding knowledge base entities in order to enrich a text with metadata. Wikipedia is a popular and comprehensive knowledge base that is widely used in entity-linking systems. However, long-tail entities are not popular enough to have their own Wikipedia articles. Therefore, a knowledge base created by using Wikipedia entities would be limited to only popular entities. In order to overcome the knowledge base coverage limitation of Wikipedia-based entity-linking systems, this paper presents an entity-discovery system that can detect semantic types of entities that are not defined in Wikipedia. The effectiveness of the proposed system was validated empirically through the use of generated data sets for the Turkish language. The experimental results show that, in terms of accuracy, our system performs competitively in comparison to the previous methods in the literature. Its high performance is achieved through a method that learns word embeddings for candidate entities
___
- [1] Shen W, Wang J, Han J. Entity linking with a knowledge base: issues, techniques, and solutions. IEEE T Knowl Data En 2015; 27: 443-460.
- [2] Nakashole N, Tylenda T, Weikum G. Fine-grained semantic typing of emerging entities. In: ACL 2013 51st Annual Meeting of the Association for Computational Linguistics; 49 August 2013; Sofia, Bulgaria. pp. 1488-1497.
- [3] Ling X, Weld DS. Fine-grained entity recognition. In: 26th AAAI Conference on Artificial Intelligence; 2226 July 2012; Toronto, Canada. Palo Alto, CA, USA: AAAI Press. pp. 94-100.
- [4] Xing C, Wang D, Zhang X, Liu C. Document classification with distributions of word vectors. In: APSIPA 2014 Asia-Pacific Signal and Information Processing Association Conference; 912 December 2014; Siem Reap, Cambodia. New York, NY, USA: IEEE. pp. 1-5.
- [5] Luong T, Socher R, Manning CD. Better word representations with recursive neural networks for morphology. In: CoNLL 2013 Computational Natural Language Learning Conference; Sofia, Bulgaria. pp. 104-113.
- [6] Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. In: ICLR 2013 International Conference on Learning Representations; 24 May 2013; Scottsdale, AZ, USA.
- [7] Pennington J, Socher R, Manning CD. Glove: global vectors for word representation. In: EMNLP 2014 Empirical Methods in Natural Language Processing Conference; 2529 October; Doha, Qatar. pp. 1532-1543.
- [8] Nadeau D, Sekine S. A survey of named entity recognition and classification. Linguisticae Investigationes 2007; 30: 3-26.
- [9] Lin T, Mausam, Etzioni O. No noun phrase left behind: detecting and typing unlinkable entities. In: EMNLPCoNLL 2012 Empirical Methods in Natural Language Processing and Computational Natural Language Learning Conference; 1214 July 2012; Stroudsburg, PA, USA. pp. 893-903.
- [10] Rahman A, Ng V. Inducing fine-grained semantic classes via hierarchical and collective classification. In: COLING 2010 23rd International Conference on Computational Linguistics; 2327 August 2010; Stroudsburg, PA, USA. pp. 931-939.
- [11] Yosef MA, Bauer S, Hoffart J, Spaniol M, Weikum G. Hyena: hierarchical type classification for entity names. In: COLING 2012 24th International Conference on Computational Linguistics; 815 December 2012; Mumbai, India. p. 1361.
- [12] Desmet B, Hoste V. Fine-grained Dutch named entity recognition. Lang Resour Eval 2014; 48: 307-343.
- [13] Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J. Freebase: a collaboratively created graph database for structuring human knowledge. In: ACM SIGMOD 2008 International Conference on Management of Data; 912 June 2008; Vancouver, Canada. New York, NY, USA: ACM. pp. 1247-1250.
- [14] Hoffart J, Suchanek FM, Berberich K, Weikum G. Yago2: a spatially and temporally enhanced knowledge base from Wikipedia. Artif Intell 2013; 194: 28-61.
- [15] Yogatama D, Gillick D, Lazic N. Embedding methods for fine grained entity type classification. In: ACL-IJCNLP 2015 53rd Annual Meeting of the Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing; 2631 July 2015; Beijing, China. pp. 291-296.
- [16] Seker GA, Eryigit G. Initial explorations on using CRFs for Turkish named entity recognition. In: COLING 2012 24th International Conference on Computational Linguistics; 815 December 2012; Mumbai, India. pp. 2459-2474.
- [17] Tatar S, Cicekli I. Automatic rule learning exploiting morphological features for named entity recognition in Turkish. J Inf Sci 2011; 37: 137-151.
- [18] Akin AA, Akin MD. Zemberek, an open source NLP framework for Turkic languages. Structure 2007; 10: 1-5.
- [19] Eryigit G. ITU Turkish NLP Web Service. In: EACL 2014 14th Conference of the European Chapter of the Association for Computational Linguistics; 2630 April 2014; Gothenburg, Sweden. pp. 1-4.
- [20] Can F, Kocberber S, Balcik E, Kaynak C, Ocalan HC, Vursavas OM. Information retrieval on Turkish texts. J Am Soc Inf Sci Technol 2008; 59: 407-421.
- [21] Sekine S. Extended named entity ontology with attribute information. In: LREC 2008 6th International Conference on Language Resources and Evaluation; 2830 May 2008; Marrakech, Morocco. pp. 52-57.
- [22] Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ. Liblinear: a library for large linear classification. J Mach Learn 2008; 9: 1871-1874.
- [23] Heaton J. Encog: library of interchangeable machine learning models for Java and C#. J Mach Learn Res 2015; 16: 1243-1247.