GrantExtractor: A winning system for extracting grant support information from biomedical literature

Suyang Dai, Zihan Zhang, Wenxuan Zuo, Xiaodi Huang, Shanfeng Zhu

Research output: Book chapter/Published conference paperConference paperpeer-review

1 Citation (Scopus)

Abstract

As the important information in MEDLINE database, grant support (GS) refers to funding agencies and contract numbers. For funding organizations, GS plays a crucial role in tracking their funding outcomes. In this paper, we present a pipeline system called GrantExtractor that is able to automatically extract funding information from biomedical literature. GrantExtractor is a novel solution to the practical problem of GS information extraction, which is related to both name entity recognition and relation extraction. Our approaches rely on an integration of several modern machine learning techniques. In particular, funding sentences in articles are first identified by a sentence classifier. Entities of grant numbers and agencies are then extracted from these funding sentences by a bi-directional LSTM and the CRF layer (BiLSTM-CRF), as well as pattern matching. After removing noisy numbers by a multi-class model, we finally match each grant number with its corresponding agency. Experimental results on benchmark datasets show that GrantExtractor clearly outperformed all baseline methods. In addition, GrantExtractor won the first place in Task 5C of 2017 BioASQ challenge, achieving the Micro-recall of 0.9526 for 22,610 articles. This number is 33% higher than 0.7174, which is the highest score as the baseline of“BioASQ Filtering” provided by National Library of Medicine (NLM). Moreover, GrantExtractor has achieved the Micro F-measure score as high as 0.90 in the task of extracting grant pairs.
Original languageEnglish
Title of host publication 2018 IEEE International Conference on bioinformatics and Biomedicine (BIBM)
PublisherIEEE Xplore
Pages333-340
Number of pages8
ISBN (Electronic)9781538654880
ISBN (Print)9781538654897 (Print on demand)
DOIs
Publication statusPublished - 24 Jan 2019
Event2018 International Conference on Bioinformatics and Biomedicine: BIBM 2018 - NH Collection Madrid Eurobuilding, Madrid, Spain
Duration: 03 Dec 201806 Dec 2018
http://orienta.ugr.es/bibm2018/
https://ieeexplore.ieee.org/xpl/conhome/8609864/proceeding (proceedings)

Conference

Conference2018 International Conference on Bioinformatics and Biomedicine
Country/TerritorySpain
CityMadrid
Period03/12/1806/12/18
Internet address

Fingerprint

Dive into the research topics of 'GrantExtractor: A winning system for extracting grant support information from biomedical literature'. Together they form a unique fingerprint.

Cite this