Incremental author name disambiguation using author profile models and self-citations

Author name ambiguity in bibliographic databases (BDs) such as DBLP is a challenging problem that degrades the information retrieval quality, citation analysis, and proper attribution to the authors. It occurs when several authors have the same name (homonym) or when an author publishes under several name variants (synonym). Traditionally, much research has been conducted to disambiguate whole bibliographic database at once whenever some new citations are added in these BDs. However, it is more time-consuming and discards the manual disambiguation effects (if any). Only a few incremental author name disambiguation methods are proposed but these methods produce fragmented clusters which lower their accuracy. In this paper, a method, called CAND, that uses author profile models and self-citations for incremental author name disambiguation is proposed. CAND introduces name indices that enhance the overall system response by comparing the newly inserted references to the indexed author clusters. Author profile models are generated for the existing authors in BDs which help in disambiguating the newly inserted references. A comparator function is proposed to resolve the incremental author name ambiguity which utilizes the most strong bibliometric features such as coauthor, titles, author profile models, and self-citations. Two real-world data sets, one from Arnetminer and the other from BDBComp, are used to validate CAND's performance. Experimental results show that CAND's performance is overall better than the existing state-of-the-art incremental author name disambiguation methods.