Turkish entity discovery with word embeddings

Entity-linking systems link noun phrase mentions in a text to their corresponding knowledge base entities in order to enrich a text with metadata. Wikipedia is a popular and comprehensive knowledge base that is widely used in entity-linking systems. However, long-tail entities are not popular enough to have their own Wikipedia articles. Therefore, a knowledge base created by using Wikipedia entities would be limited to only popular entities. In order to overcome the knowledge base coverage limitation of Wikipedia-based entity-linking systems, this paper presents an entity-discovery system that can detect semantic types of entities that are not defined in Wikipedia. The effectiveness of the proposed system was validated empirically through the use of generated data sets for the Turkish language. The experimental results show that, in terms of accuracy, our system performs competitively in comparison to the previous methods in the literature. Its high performance is achieved through a method that learns word embeddings for candidate entities.