A novel approach for entity resolution in scientific documents using context graphs

Changqin Huang, Jia Zhu, Xiaodi Huang, Min Yang, Gabriel Fung, Qintai Hu

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Entity resolution refers to disambiguating and resolving entities in structured and unstructured data. Developments of effective resolution algorithms are significant for processing scientific documents, particularly for biomedical literature. Specifically, name ambiguity among biomedical entities is a primary task that needs to be solved in the knowledge extraction process. In this paper, we present a novel approach to disambiguating gene/protein names by using context graphs. A set of abstracts of documents is used to build the context graphs through disclosing the indirect co-occurrence relationships among words. Feature vectors of the graphs can be constructed according to information gain (IG) on the word set. To evaluate the IG values, we propose a new metrics that integrates the word frequency (WF), dispersion degree (DD) and concentration degree (CD). Finally, entity resolution is performed by applying a support vector machine (SVM). Compared to existing approaches, the proposed method is capable of discovering latent information from the context of entity names, rather than using some statistical information such as the number of occurrences of words. Based on the results from comprehensive experiments over two benchmark datasets, we conclude that our proposed method, compared to several existing solutions, for resolving ambiguity entities is promising.

Original languageEnglish
Pages (from-to)431-441
Number of pages11
JournalInformation Sciences
Volume432
Early online dateDec 2017
DOIs
Publication statusPublished - Mar 2018

Fingerprint Dive into the research topics of 'A novel approach for entity resolution in scientific documents using context graphs'. Together they form a unique fingerprint.

  • Cite this