Matching named entities with the aid of Wikipedia

In this paper we propose a novel framework using features extracted from Wikipedia for the task of matching named entities. We present how the employed features are extracted from Wikipedia and how its structure is utilized. We describe how a term-concepts table constructed from Wikipedia and the redirect links are integrated in the framework. In addition, the internal links within Wikipedia along with the categories structure are also used to compute the relatedness between concepts. We evaluate the built framework and report its performance in the applications of word sense disambiguation and named entities matching. The system performance is compared against other learning-based state-of-the-art systems and its reported results are found to be competitive. We also present a method in this paper for named entities matching that is context-independent and compare its results with those of other systems.

Matching named entities with the aid of Wikipedia

In this paper we propose a novel framework using features extracted from Wikipedia for the task of matching named entities. We present how the employed features are extracted from Wikipedia and how its structure is utilized. We describe how a term-concepts table constructed from Wikipedia and the redirect links are integrated in the framework. In addition, the internal links within Wikipedia along with the categories structure are also used to compute the relatedness between concepts. We evaluate the built framework and report its performance in the applications of word sense disambiguation and named entities matching. The system performance is compared against other learning-based state-of-the-art systems and its reported results are found to be competitive. We also present a method in this paper for named entities matching that is context-independent and compare its results with those of other systems.

___

  • Lesk M. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation; 1986; Toronto, ON, Canada: ACM. pp. 24–26.
  • Turdakov D, Velikhov P. Semantic relatedness metric for Wikipedia concepts based on link analysis and its application to word sense disambiguation. In: Colloquium on Databases and Information Systems (SYRCoDIS); 2008; CEUR-WS.
  • Patwardhan S, Banerjee S, Pedersen T. UMND1: Unsupervised word sense disambiguation using contextual semantic relatedness. In: Proceedings of the 4th International Workshop on Semantic Evaluations; 2007; Prague,
  • Czech Republic: Association for Computational Linguistics. pp. 390–393.
  • Lehmann J, Monahan S, Nezda L, Jung A, Shi Y. LCC approaches to knowledge base population at TAC 2010. In: Proceedings of the Third Text Analysis Conference (TAC 2010); 2010; Gaithersburg, MD, USA.
  • Guo Y, Tang G, Che W, Liu T, Li S. HIT Approaches to entity linking at TAC 2011. In: Proceedings of the Fourth Text Analysis Conference (TAC 2011); 2011.
  • Cucerzan S. TAC entity linking by performing full-document entity extraction. In: Proceedings of the Fourth Text Analysis Conference (TAC 2011); 2011.
  • Geng X, Liu TY, Qin T, Arnold A, Li H, Shum HY. Query dependent ranking using k-nearest neighbor. In: Proceedings of the 31st Annual International Conference on Research and Development in Information Retrieval;
  • 2008; ACM SIGIR. pp. 115–122.
  • Zhu ZA, Chen W, Wan T, Zhu C, Wang G, Chen Z. To divide and conquer search ranking by learning query difficulty. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management; 2009; New
  • York, NY, USA: ACM. pp. 1883–1886.
  • Bagga A, Baldwin B. Entity-based cross-document coreferencing using the vector space model. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics; 1998; Stroudsburg, PA, USA: Association for Computational Linguistics. pp. 79–85.
  • Ravin Y, Kazi Z. Is Hillary Rodham Clinton the President? Disambiguating names across documents. In: Proceed- ings of the ACL ’99 Workshop on Coreference and Its Applications; 1999; College Park, MD, USA. pp. 9–16.
  • Wacholder N, Ravin Y, Choi M. Disambiguation of proper names in text. In: Proceedings of the Fifth Conference on Applied Natural Language Processing; 1997; Stroudsburg, PA, USA: Association for Computational Linguistics. pp. 202–208.
  • Pedersen T, Purandare A, Kulkarni A. Name discrimination by clustering similar contexts. In: Proceedings of the 6th International Conference on Computational Linguistics and Intelligent Text Processing; 2005; Berlin, Germany:
  • Springer-Verlag. pp. 226–237.
  • Mann GS, Yarowsky D. Unsupervised personal name disambiguation. In: Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003; 2003; Stroudsburg, PA, USA: Association for Computational
  • Linguistics. pp. 33–40.
  • Bekkerman R, McCallum A. Disambiguating Web appearances of people in a social network. In: Proceedings of the 14th International Conference on World Wide Web; 2005; New York, NY, USA: ACM. pp. 463–470.
  • Malin B, Airoldi E, Carley KM. A network analysis model for disambiguation of names in lists. Comput Math Organ Th 2005; 11: 119–139.
  • Hassan H, Hassan A, Noeman S. Graph based semi-supervised approach for information extraction. In: Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing; 2006; Stroudsburg, PA, USA:
  • Association for Computational Linguistics. pp. 9–16.
  • Kalashnikov DV, Nuray-Turan R, Mehrotra S. Towards breaking the quality curse. A web-querying approach to web people search. In: The 31st Annual International Conference on Research and Development in Information
  • Retrieval; 2008; ACM SIGIR. pp. 27–34.
  • Miller GA. WordNet: A lexical database for English. Commun ACM 1995; 38: 39–41.
  • Alfonseca E, Manandhar S. An unsupervised method for general named entity recognition and automated concept discovery. In: Proceedings of the 1st International Conference on General WordNet; 2002.
  • Liu J, Birnbaum L. Measuring semantic similarity between named entities by searching the web directory. In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence; 2007; Washington, DC, USA:
  • IEEE Computer Society. pp. 461–465.
  • Wan S, Angryk RA. Measuring semantic similarity using WordNet-based context vectors. In: 2007 IEEE Interna- tional Conference on Systems, Man and Cybernetics; 2007; Montreal, QC, Canada. pp. 908–913.
  • Bunescu R, Pasca M. Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL-06); 2006; Trento, Italy. pp. 9–16.
  • Cucerzan S. Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL); 2007; Prague, Czech Republic: Association for Computational Linguistics. pp. 708–716.
  • Bawakid A, Oussalah M. Using features extracted from Wikipedia for the task of word sense disambiguation. In: 9th IEEE International Conference on Cybernetic Intelligent Systems 2010; 2010; Reading, UK: IEEE.
  • Manning CD, Schuetze H. Foundations of Statistical Natural Language Processing. 1st ed. Cambridge, MA, USA: MIT Press; 1999.
  • Mihalcea R. Using Wikipedia for automatic word sense disambiguation. In: Proceedings of the Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; 2007; Rochester, NY, USA.
  • Mihalcea R, Csomai A. Wikify : linking documents to encyclopedic knowledge. In: CIKM ’07: Proceedings of the 16th ACM Conference on Information and Knowledge Management; 2007; Lisbon, Portugal: ACM. pp. 242–233.
  • Fogarolli A. Word sense disambiguation based on Wikipedia link structure. In: International Conference on Semantic Computing; 2009; IEEE. pp. 77–82.
  • Hachey B, Radford W, Nothman J, Honnibal M, Curran JR. Evaluating entity linking with Wikipedia. Artif Intell 2013; 194: 130–150.
  • Zhang W, Sim YC, Su J, Tan CL. NUS-I2R: Learning a combined system for entity linking. In: Proceedings of the Third Text Analysis Conference (TAC 2010); 2010; Gaithersburg, MD, USA.
  • Gao S, Cai Y, Li S, Zhang Z, Guan J, Li Y, Zhang H, Xu W, Guo J. PRIS at TAC2010 KBP track. In: Proceedings of the Third Text Analysis Conference (TAC 2010); 2010; Gaithersburg, MD, USA.