Resolving namesakes using the author's social network

Resolving namesakes using the author's social network

Author name ambiguity may occur when multiple authors share the same name or different name variations of a single author exist. This degrades search results and correct attributions in bibliographic databases. Existing solutions require either the actual number of ambiguous authors or extra information that is collected from the Web. However, in many scenarios, obtaining such auxiliary information is not possible or requires much extra effort. An effective and scalable method, ASONET, is proposed that uses graph community detection algorithms and graph operations to disambiguate namesakes. The citation dataset is preprocessed and ambiguous author blocks are formed. A graph structural clustering, gSkeletonClu, is applied to identify hubs, outliers, and clusters of nodes in a coauthor's graph. Namesakes are resolved by splitting these clusters across the hub if their feature vector similarity is less than a prede ned threshold. ASONET utilizes only coauthors and titles that are surely available in all bibliographic databases. To validate the ASONET performance, experiments are performed on two real-world datasets of Arnetminer and DBLP. The results con rm that ASONET is scalable and outperforms baselines.

___

  • [1] Ferreira AA, Goncalves MA, Laender AH. A brief survey of automatic methods for author name disambiguation. SIGMOD Rec 2012; 41: 15-26.
  • [2] Tang J, Fong AC, Wang B, Zhang J. A uni ed probabilistic framework for name disambiguation in digital library. IEEE T Knowl Data En 2012; 24: 975-987.
  • [3] Shin D, Kim T, Choi J, Kim J. Author name disambiguation using a graph model with node splitting and merging based on bibliographic information. Scientometrics 2014; 100: 15-50.
  • [4] Fan X, Wang J, Pu X, Zhou L, Lv B. On graph-based name disambiguation. ACM J Data Inf Qual 2011; 2: 10.
  • [5] Han D, Liu S, Hu Y, Wang B, Sun Y. ELM-based name disambiguation in bibliography. World Wide Web 2015; 18: 253-263.
  • [6] Ferreira AA, Veloso A, Goncalves MA, Laender AH. Self-training author name disambiguation for information scarce scenarios. J Assoc Inf Sci Tech 2014; 65: 1257-1278.
  • [7] Wang X, Tang J, Cheng H, Philip SY. Adana: Active name disambiguation. In: 2011 IEEE 11th International Conference on Data Mining; 11 December 2011. New York, NY, USA: IEEE. pp. 794-803.
  • [8] Liu Y, Li W, Huang Z, Fang Q. A fast method based on multiple clustering for name disambiguation in bibliographic citations. J Assoc Inf Sci Tech 2015; 66: 634-644.
  • [9] Liu W, Islamaj DR, Kim S, Comeau DC, Kim W, Yeganova L, Lu Z, Wilbur WJ. Author name disambiguation for PubMed. J Assoc Inf Sci Tech 2014; 65: 765-781.
  • [10] Levin M, Krawczyk S, Bethard S, Jurafsky D. Citation-based bootstrapping for large-scale author disambiguation. J Am Soc Inform Sci 2012; 63: 1030-1047.
  • [11] Kang IS, Na SH, Lee S, Jung H, Kim P, Sung WK, Lee JH. On co-authorship for author disambiguation. Inform Process Manag 2009; 45: 84-97.
  • [12] Wu H, Li B, Pei Y, He J. Unsupervised author disambiguation using Dempster-Shafer theory. Scientometrics 2014; 101: 1955-1972.
  • [13] Onodera N, Iwasawa M, Midorikawa N, Yoshikane F, Amano K, Ootani Y, Kodama T, Kiyama Y, Tsunoda H, Yamazaki S. A method for eliminating articles by homonymous authors from the large number of articles retrieved by author search. J Am Soc Inf Sci Tec 2011; 62: 677-690.
  • [14] Zhu J, Yang Y, Xie Q, Wang L, Hassan SU. Robust hybrid name disambiguation framework for large databases. Scientometrics 2014; 98: 2255-2274.
  • [15] DeCarvalho AP, Ferreira AA, Laender AH, Goncalves MA. Incremental unsupervised name disambiguation in cleaned digital libraries. Journal of Information and Data Management 2011; 2: 289.
  • 16 Peng HT, Lu CY, Hsu W, Ho JM. Disambiguating authors in citations on the web and authorship correlations. Expert Syst Appl 2012; 39: 10521-10532.
  • [16] Tang L, Walsh JP. Bibliometric ngerprints: name disambiguation based on approximate structure equivalence of cognitive maps. Scientometrics 2010; 84: 763-784.
  • [17] Li S, Cong G, Miao C. Author name disambiguation using a new categorical distribution similarity. Lect Notes Artif Int 2012; 7523: 569-584.
  • [18] Vishnyakova D. Author name disambiguation in MEDLINE based on journal descriptors and semantic types. In: BioTxtM; 2016. pp. 134-142.
  • [19] Tran HN, Huynh T, Do T. Author name disambiguation by using deep neural network. arXiv preprint 2015; 1502.08030.
  • [20] Song M, Kim EHJ, Kim HJ. Exploring author name disambiguation on PubMed-scale. J Informetr 2015; 9: 924-941.
  • [21] Qian Y, Zheng Q, Sakai T, Ye J, Liu J. Dynamic author name disambiguation for growing digital libraries. Inform Retrieval J 2015; 18: 379-412.
  • [22] Santana AF, Goncalves MA, Laender AH, Ferreira AA. Incremental author name disambiguation by exploiting domain-speci c heuristics. J Assoc Inf Sci Tech 2017; 68: 931-945.
  • [23] Torvik VI, Smalheiser NR. Author name disambiguation in MEDLINE. ACM T Knowl Discov D 2009; 3: 11.
  • [24] Han H, Giles L, Zha H, Li C, Tsioutsiouliklis K. Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of 2004 Joint ACM/IEEE Conference on DL; 2004. New York, NY, USA: IEEE. pp. 296-305.
  • [25] Huang J, Sun H, Song Q, Deng H, Han J. Revealing density-based clustering structure from the core-connected tree of a network. IEEE T Knowl Data En 2013; 25: 1876-1887.
  • [26] Johnson DB. Finding all the elementary circuits of a directed graph. SIAM J Comput 1975; 4: 77-84.
  • [27] Cota RG, Ferreira AA, Nascimento C, Goncalves MA, Laender AH. An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. J Assoc Inf Sci Tech 2010; 61: 1853-1870.