Abstract

We describe a system that automatically extracts biological events from biomedical journal articles, and translates those events into Biological Expression Language (BEL) statements. The system incorporates existing text mining components for coreference resolution, biological event extraction and a previously formally untested strategy for BEL statement generation. Although addressing the BEL track (Track 4) at BioCreative V (2015), we also investigate how incorporating coreference resolution might impact event extraction in the biomedical domain. In this paper, we report that our system achieved the best performance of 20.2 and 35.2 in F-score for the full BEL statement level on both stage 1, and stage 2 using provided gold standard entities, respectively. We also report that our results evaluated on the training dataset show benefit from integrating coreference resolution with event extraction.

Introduction

Biological networks such as gene regulatory networks, signal transduction pathways and metabolic pathways capture a series of protein-protein interactions, or relationships between proteins and chemicals, which could explain complex biological processes underlying specific health conditions. Since the scientific literature contains knowledge about relationships and events involving biomolecular entities such as proteins, genes, and chemicals, many text mining approaches have been developed for automatic information extraction from the literature ( 1–3 ). There is also much interest in standard representations of biological networks, such as the Biological pathway exchange language ( 4 ), the Systems Biology Markup Language ( 5 ) and the Biological Expression Language (BEL) ( 6 ). Such representations in a structured syntax can support not only visualisation of biological systems, but also computational modelling of these systems ( 7–9 ).

The BioCreative V Track 4 (BEL track) addressed the task of extraction of causal network information in terms of the BEL representation, a formalised representation language for biological expression ( 10 ). The BEL statements represent knowledge of relationships between biomolecular entities. BEL statements can express biological relationships, such as protein–protein interaction, or other relations between biological processes and disease stages. The BEL structure is described in detail in ‘BEL statements and dataset’ section. Two subtasks were organised in the BEL track: generation of the corresponding BEL statement for the given text evidence (Task 1), and identification of at most 10 textual evidences for a given BEL statement (Task 2). For Task 1, systematically selected sentences from publications are provided ( 11 ), and it is required to automatically generate the BEL statements corresponding to each sentence (see Figure 1 ). The BEL track aims to stimulate development of tools that recognise biological events, and produce BEL statements for those events. The work described in this article addresses BEL Task 1.

 (a) Sample sentence from the BEL Track training corpus. (b) BEL statements corresponding to the sample sentence. (c) Representation of BEL statement derived from the sample sentence (a). The BEL statement describes that the abundance of chemical compound designated by ‘glucocorticoid’ in the CHEBI namespace increases the abundance of protein designated by ‘Resp18’ in the MGI namespace.
Figure 1.

(a) Sample sentence from the BEL Track training corpus. (b) BEL statements corresponding to the sample sentence. (c) Representation of BEL statement derived from the sample sentence (a). The BEL statement describes that the abundance of chemical compound designated by ‘glucocorticoid’ in the CHEBI namespace increases the abundance of protein designated by ‘Resp18’ in the MGI namespace.

There has been significant progress in event extraction from the biomedical literature in recent years through targeted tasks such as BioNLP-ST ( 12–14 ) and BioCreative PPI tasks ( 15 , 16 ). However, extraction of complex and hidden events involving genes and proteins remains a particular challenge due to the use of coreference expressions in texts ( 17 ). Coreference expressions such as pronouns (e.g. ‘it, they’), and definite noun phrases (e.g. ‘the protein, these genes’) are one of the major obstacles for existing methods, limiting the scope of most biomedical information extraction systems to individual sentences that directly mention entities ( 18–20 ). Abundant anaphoric mentions are used to refer to biomolecular entities that were previously mentioned in the same text, such as when interactions or events are described across clauses of sentences. With the identification of these hidden relationships, coreference resolution can benefit literature-based event extraction. Hence, we hypothesised that resolving references could improve performance on the BEL statement extraction task.

To address Task 1, therefore, we developed a pipeline system which consists of the Turku Event Extraction System (TEES) ( 21 ), coupled with a coreference resolution component and an automatic system for generating BEL statements that has not previously been formally evaluated ( 22 ). We incorporate a simple rule-based coreference resolution system developed for the biomedical domain ( 23 ). In this article, we describe our pipeline in detail, introduce a strategy for mapping from BioNLP-ST event types to BEL functions, and report the overall performance of our approach in the BEL track ( 24 ). Among five participating teams, our submissions achieved the highest F-score at the full statement level for Task 1 ( 25 ). We also present our investigation of how incorporating coreference resolution impacts the performance of event extraction for the BEL track.

Background

There have been community-wide efforts targeting biomedical event extraction since 2009, in a series of evaluations known as the BioNLP Shared Tasks ( 12 , 26 , 27 ). The initial task in 2009 mainly focused on extraction of biomedical events involving genes and proteins. Events were represented in terms of their type, trigger expressions, arguments and roles of arguments; analysis was based on event annotations in the GENIA journal abstract corpus ( 12 ). The scope of the task was extended to full journal documents from journal abstracts in 2011 ( 13 ). A coreference resolution subtask was incorporated in 2013, but the coreference task was not attempted by any participating teams ( 14 ). For the GENIA event extraction shared task, a state-of-the-art system (TEES) using machine learning methods achieved the best performance in the task 2009, and also achieved robust performance in 2011 and 2013 ( 14 , 21 , 28 ).

Text mining approaches enable the automatic extraction of such relationships from biological text. A pipeline system combining text-mining modules such as TEES and a gene normalisation component was previously implemented for event extraction and normalisation over large-scale resources ( 29 ). That system is limited to identifying events within a single sentence, and does not consider coreference resolution.

The BEL was originally developed by Selventa, a personalised healthcare organisation, with the goals of providing a formalised representation of biological relationships captured from scientific journal articles, and of supporting computational applications. To date, BEL has been used primarily in manual curation tasks; however, such manual effort cannot scale to the vastness of the biomedical literature ( 30 ). Indeed, Liu et al. ( 22 ) previously sought to address this by introducing a system for automatic generation of BEL statements from the biomedical literature. It uses the TEES system ( 21 ) for extraction of biological events, and translates the extracted events into BEL statements. However, the performance of the system was not formally evaluated in that prior work. Our pipeline for the BEL track is built on this system and we present its first public evaluation.

There have been several efforts addressing coreference resolution for the biomedical literature, though it remains an underexplored problem. The Protein Coreference shared task ( 20 , 31 ) was organised to identify anaphoric coreference links involving proteins and genes, as a supporting task in the BioNLP shared task 2011 ( 27 ). The best performing system ( 32 ) modified an existing system, Reconcile ( 33 ), and achieved 34.1 F-score, with 73.3 Precision and 22.2 Recall. There are recent studies for biomedical coreference resolution, afterwards the BioNLP task 2011. Miwa et al. ( 34 ) developed a novel coreference resolution system using a rule-based approach, and improved the performance on the same gold standard corpus, reporting a 55.9 F-score. A coreference resolution module was incorporated into an existing event extraction system, EventMine ( 19 ). In that work, the output of the coreference resolution system was used as additional features for event extraction. The incorporation of the coreference resolution slightly improved event extraction performance. A hybrid approach combining rule-based and machine learning-based methods has been employed for biomedical coreference resolution ( 35 , 36 ). D’Souza and Ng ( 36 ) used the combined approach for both mention detection and anaphora resolution. Li et al. ( 35 ) also used the combined approach for some types of anaphoric mentions; they use both rule-based and machine learning methods for relative pronoun resolution, while exclusively rule-based approaches are applied for resolution of non-relative pronouns and definite noun phrases. Those recent works show that the use of different approaches in terms of anaphora types achieved substantial improvement comparing to the best performing system in the BioNLP task 2011. However these coreference resolution systems are not publicly available. In prior work ( 37 ), a general domain coreference system ( 38 ) was evaluated on biomedical text and compared to a biomedical domain-specific system ( 21 ); the results show that domain knowledge can help coreference resolution in the biomedical domain, reporting an F-score of 37% for the biomedical domain-specific system, and an F-score of 2% for the general system.

Methods

BEL statements and dataset

For the BEL track at BioCreative V, sample and training datasets were provided to support system development ( 11 ). The training dataset contains 6358 sentences selected from 3052 PubMed journal articles, and 11 072 BEL statements annotated from these sentences. A sample sentence and its corresponding BEL statements are shown in Figure 1a and b . Each BEL statement is represented as a triple structure of ‘subject-predicate-object’, where subjects and objects are biomolecular entities such as proteins, genes and chemicals with namespace identifiers and their functions, and predicates describe the relationship between these entities. An example BEL statement is shown in Figure 1c . A test dataset was released for evaluation of system performance. It contains 105 sentences from 104 PubMed journal articles in the same format as the training dataset.

BEL statements capture relationships between entities (BEL Terms), making use of external vocabularies and ontologies to represent entities, including namespaces to unambiguously represent entities. Over 20 different namespaces are defined for BEL statements, for simplicity the BEL track is limited to only six namespaces to express entity types such as genes, diseases, chemicals and biological processes. The namespaces with their associated functions and occurrence counts in both training and test datasets are described in Table 1 . For 11 072 BEL statements in the training data, BEL terms are mostly annotated with human protein coding genes and mouse genes.

Name spaceEntity conceptFunction Long FormFunction Short FormExampleCount (Train)Count (Test)
HGNCHuman protein coding genesproteinAbundance(),p(),p(HGNC:MAPK14)7, (33%)127 (43%)
geneAbundance(),g(),
rnaAbundance(),r(),
microRNAAbundance()m()
MGIMouse genesproteinAbundance(),p(),p(MGI:Mapk14)12 231 (53%)111 (38%)
geneAbundance(),g(),
rnaAbundance(),r(),
microRNAAbundance()m()
EGIDGenes in a wide range of speciesproteinAbundance(),p(),p(EGID:1432)140 (0.6%)0
geneAbundance(),g(),
rnaAbundance()r()
GOBPBiological processesbiologicalProcess()bp()bp(GOBP:"cell proliferation")1927 (8%)23 (8%)
MESHDDiseasespathology() path()path(MESHD:Hyperoxia)244 (1%)11 (4%)
CHEBIChemicalsabundance()a()a(CHEBI: lipopoly-saccharide)875 (3.8%)23 (8%)
Name spaceEntity conceptFunction Long FormFunction Short FormExampleCount (Train)Count (Test)
HGNCHuman protein coding genesproteinAbundance(),p(),p(HGNC:MAPK14)7, (33%)127 (43%)
geneAbundance(),g(),
rnaAbundance(),r(),
microRNAAbundance()m()
MGIMouse genesproteinAbundance(),p(),p(MGI:Mapk14)12 231 (53%)111 (38%)
geneAbundance(),g(),
rnaAbundance(),r(),
microRNAAbundance()m()
EGIDGenes in a wide range of speciesproteinAbundance(),p(),p(EGID:1432)140 (0.6%)0
geneAbundance(),g(),
rnaAbundance()r()
GOBPBiological processesbiologicalProcess()bp()bp(GOBP:"cell proliferation")1927 (8%)23 (8%)
MESHDDiseasespathology() path()path(MESHD:Hyperoxia)244 (1%)11 (4%)
CHEBIChemicalsabundance()a()a(CHEBI: lipopoly-saccharide)875 (3.8%)23 (8%)
Name spaceEntity conceptFunction Long FormFunction Short FormExampleCount (Train)Count (Test)
HGNCHuman protein coding genesproteinAbundance(),p(),p(HGNC:MAPK14)7, (33%)127 (43%)
geneAbundance(),g(),
rnaAbundance(),r(),
microRNAAbundance()m()
MGIMouse genesproteinAbundance(),p(),p(MGI:Mapk14)12 231 (53%)111 (38%)
geneAbundance(),g(),
rnaAbundance(),r(),
microRNAAbundance()m()
EGIDGenes in a wide range of speciesproteinAbundance(),p(),p(EGID:1432)140 (0.6%)0
geneAbundance(),g(),
rnaAbundance()r()
GOBPBiological processesbiologicalProcess()bp()bp(GOBP:"cell proliferation")1927 (8%)23 (8%)
MESHDDiseasespathology() path()path(MESHD:Hyperoxia)244 (1%)11 (4%)
CHEBIChemicalsabundance()a()a(CHEBI: lipopoly-saccharide)875 (3.8%)23 (8%)
Name spaceEntity conceptFunction Long FormFunction Short FormExampleCount (Train)Count (Test)
HGNCHuman protein coding genesproteinAbundance(),p(),p(HGNC:MAPK14)7, (33%)127 (43%)
geneAbundance(),g(),
rnaAbundance(),r(),
microRNAAbundance()m()
MGIMouse genesproteinAbundance(),p(),p(MGI:Mapk14)12 231 (53%)111 (38%)
geneAbundance(),g(),
rnaAbundance(),r(),
microRNAAbundance()m()
EGIDGenes in a wide range of speciesproteinAbundance(),p(),p(EGID:1432)140 (0.6%)0
geneAbundance(),g(),
rnaAbundance()r()
GOBPBiological processesbiologicalProcess()bp()bp(GOBP:"cell proliferation")1927 (8%)23 (8%)
MESHDDiseasespathology() path()path(MESHD:Hyperoxia)244 (1%)11 (4%)
CHEBIChemicalsabundance()a()a(CHEBI: lipopoly-saccharide)875 (3.8%)23 (8%)

In addition to the abundance functions, five selected functions that describe activities such as modification, transformation or translocation are also in scope for the BEL statements in the BioCreative BEL tasks. BEL terms are arguments of these functions as described in Table 2 . In the training dataset, there are 1351 entities that have a modification activity, and 205 entities for degradation activities.

Table 2.

Other BEL functions ( http://wiki.openbel.org/display/BIOC/BEL+Documentation#BELDocumentation-OtherFunctions ) selected in the BEL track at BioCreative V

FunctionTypeExampleCount (Train)
complex()complex abundance(complex(p(MGI:Itga8),p(MGI:Itgb1))) -> bp(GOBP:"cell adhesion")758
pmod()protein modificationp(MGI:Cav1,pmod(P)) -> a(CHEBI:"nitric oxide")1,351
deg()degradationp(MGI:Lyve1) -> deg(a(CHEBI:"hyaluronic acid"))205
tloc()translocationa(CHEBI:"brefeldin A") -> tloc(p(MGI:Stk16))101
act()molecular activitycomplex(p(MGI:Cckbr),p(MGI:Gast)) -> act(p(MGI:Prkd1))124
FunctionTypeExampleCount (Train)
complex()complex abundance(complex(p(MGI:Itga8),p(MGI:Itgb1))) -> bp(GOBP:"cell adhesion")758
pmod()protein modificationp(MGI:Cav1,pmod(P)) -> a(CHEBI:"nitric oxide")1,351
deg()degradationp(MGI:Lyve1) -> deg(a(CHEBI:"hyaluronic acid"))205
tloc()translocationa(CHEBI:"brefeldin A") -> tloc(p(MGI:Stk16))101
act()molecular activitycomplex(p(MGI:Cckbr),p(MGI:Gast)) -> act(p(MGI:Prkd1))124
Table 2.

Other BEL functions ( http://wiki.openbel.org/display/BIOC/BEL+Documentation#BELDocumentation-OtherFunctions ) selected in the BEL track at BioCreative V

FunctionTypeExampleCount (Train)
complex()complex abundance(complex(p(MGI:Itga8),p(MGI:Itgb1))) -> bp(GOBP:"cell adhesion")758
pmod()protein modificationp(MGI:Cav1,pmod(P)) -> a(CHEBI:"nitric oxide")1,351
deg()degradationp(MGI:Lyve1) -> deg(a(CHEBI:"hyaluronic acid"))205
tloc()translocationa(CHEBI:"brefeldin A") -> tloc(p(MGI:Stk16))101
act()molecular activitycomplex(p(MGI:Cckbr),p(MGI:Gast)) -> act(p(MGI:Prkd1))124
FunctionTypeExampleCount (Train)
complex()complex abundance(complex(p(MGI:Itga8),p(MGI:Itgb1))) -> bp(GOBP:"cell adhesion")758
pmod()protein modificationp(MGI:Cav1,pmod(P)) -> a(CHEBI:"nitric oxide")1,351
deg()degradationp(MGI:Lyve1) -> deg(a(CHEBI:"hyaluronic acid"))205
tloc()translocationa(CHEBI:"brefeldin A") -> tloc(p(MGI:Stk16))101
act()molecular activitycomplex(p(MGI:Cckbr),p(MGI:Gast)) -> act(p(MGI:Prkd1))124

System description

Our system consists of four components in a pipeline: coreference resolution, coreference substitution, biomedical event extraction and BEL statement generation, as illustrated in Figure 2 .

Workflow of our system for producing BEL statements from input text with examples.
Figure 2.

Workflow of our system for producing BEL statements from input text with examples.

Input sentences are processed to identify coreference relations between anaphoric expressions and their referring mentions (antecedents). Those coreference expressions are replaced with their antecedents in the original sentences to produce resolved versions. Then, the coreference-substituted sentences are submitted to an event extraction system, TEES ( 21 ), and results of the TEES system are post-processed and converted into BEL statements. Gene and protein entities identified by the event extraction system are also normalised using selected resources such as HUGO Gene Nomenclature Committee (HGNC) Entrez Gene Identifier (EGID), and MGI in the process of generating BEL statements. In this way, we aim to identify events involving biological entities, including those that are described linguistically using anaphoric coreference mentions. The details of each component of the system are described in the following sections.

Coreference resolution

The coreference resolution system was developed using a rule-based approach, tailored to the requirements of the BioNLP-ST’11 Coreference corpus ( 20 ). The coreference resolution system selects anaphoric mentions in the text (anaphor), and determines what the anaphor refers to (antecedent). The system consists of three stages: data pre-processing, identification of anaphoric mentions and determination of antecedents. In the pre-processing step, input texts are tokenised and syntactically parsed using the Stanford parser ( 39 ), and biomedical entities such as genes and proteins are identified using a biomedical Named Entity Recognition (NER) module, BANNER ( 40 ). Then, anaphoric mentions such as pronouns, e.g. ‘it’, ‘its’ and ‘they’, and definite noun phrases containing domain-specific nouns, such as ‘the protein’ and ‘these genes’ are identified in the step of anaphor selection. All noun phrases are considered as antecedent candidates. These candidates are ranked by a set of syntactic and semantic rules, and the top ranked candidate is determined as the antecedent corresponding to an anaphoric mention in the step of antecedent determination. The three basic rules used for the determination of an antecedent are stated below.

Rule 1 : Antecedent candidates which do not agree in number (single or plural) with an anaphor are filtered out.

Rule 2 : If the anaphor is a definite noun phrase, only antecedent candidates identified as genes and proteins using a biomedical NER module are kept; all others are removed.

Rule 3 : The closest candidate that satisfies the two previous constraints is chosen.

The syntactic rule (Rule 1) used in our coreference resolution system was adapted from the approach of the Stanford general English coreference system, which links pronominal coreference mentions to their corresponding antecedents ( 41 ), while the semantic rule (Rule 2) has been motivated by the approach of Nguyen et al. ( 42 ). Protein and gene entities identified by BANNER, and noun phrases containing such entities are preferentially considered as antecedents for the definite noun phrases containing domain-specific terms such as ‘gene’, ‘protein’, ‘receptor’ and ‘molecule’. Even though performance of the simple coreference resolution system could not reach state-of-the-art systems such as Miwa et al. ( 34 ), D'Souza and Ng ( 36 ) and Li et al. ( 35 ), it outperforms the best published results for the BioNLP’11 Protein Coreference shared task, as shown in Table 3 . We use our simple coreference system, since those systems are not publicly available. More details and an evaluation of this system are available in Choi et al. ( 23 ), and the system will be investigated for further improvement as future work.

Table 3.

Our coreference resolution system performance comparing with the best performing system ( 33 ) in the BioNLP-ST’11 Coreference task ( 20 ) and state-of-the-art coreference resolution systems (italicised)

PrecisionRecallF-score
UUtah ( 33 ) 73.322.234.1
Our system ( 44 ) 46.350.048.0
Miwa et al. ( 35 ) 62.750.455.9
D’Souza and Ng ( 37 ) 67.255.660.9
Li et al. ( 36 ) 67.569.868.1
PrecisionRecallF-score
UUtah ( 33 ) 73.322.234.1
Our system ( 44 ) 46.350.048.0
Miwa et al. ( 35 ) 62.750.455.9
D’Souza and Ng ( 37 ) 67.255.660.9
Li et al. ( 36 ) 67.569.868.1

Results are based on the Test data of the BioNLP’11—Protein Coreference task.

Table 3.

Our coreference resolution system performance comparing with the best performing system ( 33 ) in the BioNLP-ST’11 Coreference task ( 20 ) and state-of-the-art coreference resolution systems (italicised)

PrecisionRecallF-score
UUtah ( 33 ) 73.322.234.1
Our system ( 44 ) 46.350.048.0
Miwa et al. ( 35 ) 62.750.455.9
D’Souza and Ng ( 37 ) 67.255.660.9
Li et al. ( 36 ) 67.569.868.1
PrecisionRecallF-score
UUtah ( 33 ) 73.322.234.1
Our system ( 44 ) 46.350.048.0
Miwa et al. ( 35 ) 62.750.455.9
D’Souza and Ng ( 37 ) 67.255.660.9
Li et al. ( 36 ) 67.569.868.1

Results are based on the Test data of the BioNLP’11—Protein Coreference task.

Event extraction

We employ a state-of-the-art event extraction system, TEES ( 21 ), which was the best performing system in the BioNLP-ST’09 GE task ( 12 ). The system uses a Support Vector Machine to train a model with the GENIA corpus. In general, the TEES system takes biomedical texts as input, and has several preprocessing steps, such as sentence segmentation using GENIA Sentence Splitter ( 43 ), biomedical NER using BANNER ( 40 ), parsing texts using the BLLIP parser ( 44 ) and the Stanford parser ( 39 ), and finding head words. Then, the system identifies events involving identified entities based on a machine learning model for event detection. For our BEL track system, texts altered by the coreference substitution step are submitted to the TEES system as input. Biological events were identified using the TEES GE11 model, trained with the BioNLP-ST’11 GE corpus ( 27 ).

Generation of BEL statements

To generate BEL statements, we adopt a system developed by Liu et al. ( 22 ), which converts events extracted by the TEES system into BEL statements. This BEL generation system makes use of probabilities of triggers and event arguments provided by the TEES system to compute a confidence score for each extracted event, and then translates the events from BioNLP event types into BEL statements.

Table 4 describes BioNLP event types and their corresponding BEL functions with mapping examples. For example, the BioNLP event, ‘Protein_catabolism:degradation Theme: p53’ is extracted by the TEES system from the sentence ‘mdm2 directly binds to the amino-terminal region of p53 and targets it for degradation through the ubiquitin-proteasome pathway’ as described in Figure 2 , and this event is converted into ‘deg(p53)’, using the BEL function for degradation. Other BioNLP event types such as ‘positive_regulation’ and ‘negative_regulation’ are converted to BEL statements by relating and nesting occurrences of simpler event types. For example, the TEES output, ‘Positive_regulation (targets) Cause:mdm2’ in Figure 2 is converted into the BEL statement ‘p(MGI:Mdm2) increases’, since the term ‘targets’ is included in the predefined positive triggers. On the other hand, the TEES result, ‘Negative_regulation (down-regulator) Cause:IL-4 Theme:C3a’ is converted to the BEL statement ‘p(HGNC:IL4) decreases p(MGI:C3ar1)’. This is because the term ‘down-regulator’ is the one of negative trigger mentions predefined in the system. In addition, the trigger expression ‘activation’ for the event type ‘Positive_regulation’ is converted to the BEL function ‘act’, used to describe molecular activities in BEL, and its example is shown in Table 4 . The event type ‘Regulation’ is not considered in the system due to its inherent ambiguity.

Table 4.

Mapping the BioNLP event types into BEL functions

BioNLPBEL functionBEL function typeMapping example
Bindingp()complex abundance ‘… binding of several BMPs …’ = > p (BMP-6)
Gene expressionr()rna abundance ‘… B cells induces both Id2 and Id3 expression ’ = > r (Id1)
Localizationtloc()translocation ‘…co-Smad (Smad4) and are translocated into the nucleus…’ = > tloc (Smad4)
Phosphorylationpmod(P)phosphorylation ‘…the phosphorylation level of the PPARalpha…’ = > (PPARalpha, pmod(P) )
protein catabolismdeg()degradation ‘…p53 and targets it for degradation…  = > deg (p53)
Transcriptionr()rna abundance ‘…High BMP-6 mRNA expressio n in DLBCL…’ = > r (BMP-6)
activation inpositive_regulationact()molecular activity ‘…IFN7 in the activated MMP12-treated samples…’ = > act (MMP12)
BioNLPBEL functionBEL function typeMapping example
Bindingp()complex abundance ‘… binding of several BMPs …’ = > p (BMP-6)
Gene expressionr()rna abundance ‘… B cells induces both Id2 and Id3 expression ’ = > r (Id1)
Localizationtloc()translocation ‘…co-Smad (Smad4) and are translocated into the nucleus…’ = > tloc (Smad4)
Phosphorylationpmod(P)phosphorylation ‘…the phosphorylation level of the PPARalpha…’ = > (PPARalpha, pmod(P) )
protein catabolismdeg()degradation ‘…p53 and targets it for degradation…  = > deg (p53)
Transcriptionr()rna abundance ‘…High BMP-6 mRNA expressio n in DLBCL…’ = > r (BMP-6)
activation inpositive_regulationact()molecular activity ‘…IFN7 in the activated MMP12-treated samples…’ = > act (MMP12)
Table 4.

Mapping the BioNLP event types into BEL functions

BioNLPBEL functionBEL function typeMapping example
Bindingp()complex abundance ‘… binding of several BMPs …’ = > p (BMP-6)
Gene expressionr()rna abundance ‘… B cells induces both Id2 and Id3 expression ’ = > r (Id1)
Localizationtloc()translocation ‘…co-Smad (Smad4) and are translocated into the nucleus…’ = > tloc (Smad4)
Phosphorylationpmod(P)phosphorylation ‘…the phosphorylation level of the PPARalpha…’ = > (PPARalpha, pmod(P) )
protein catabolismdeg()degradation ‘…p53 and targets it for degradation…  = > deg (p53)
Transcriptionr()rna abundance ‘…High BMP-6 mRNA expressio n in DLBCL…’ = > r (BMP-6)
activation inpositive_regulationact()molecular activity ‘…IFN7 in the activated MMP12-treated samples…’ = > act (MMP12)
BioNLPBEL functionBEL function typeMapping example
Bindingp()complex abundance ‘… binding of several BMPs …’ = > p (BMP-6)
Gene expressionr()rna abundance ‘… B cells induces both Id2 and Id3 expression ’ = > r (Id1)
Localizationtloc()translocation ‘…co-Smad (Smad4) and are translocated into the nucleus…’ = > tloc (Smad4)
Phosphorylationpmod(P)phosphorylation ‘…the phosphorylation level of the PPARalpha…’ = > (PPARalpha, pmod(P) )
protein catabolismdeg()degradation ‘…p53 and targets it for degradation…  = > deg (p53)
Transcriptionr()rna abundance ‘…High BMP-6 mRNA expressio n in DLBCL…’ = > r (BMP-6)
activation inpositive_regulationact()molecular activity ‘…IFN7 in the activated MMP12-treated samples…’ = > act (MMP12)

BioEntity normalisation

In the process of generating BEL statements, a protein normalisation component embedded in the Liu et al. ( 22 ) system normalises protein mentions into concepts in the Protein Ontology ( 45 ). Preliminary work suggested the coverage provided by the protein ontology was insufficient. For protein mentions not covered in the Protein Ontology, our system searches the mentions through the fields of symbol, synonyms, alternative names and description in the resources of HGNC and MGI using an exact string matching approach. Protein mentions that could not be normalised using the Protein Ontology, HGNC and MGI resources were excluded. Error analysis suggests that these excluded mentions may be related to other concepts such as disease (MeSH Diseases) and chemical compounds (ChEBI).

Results

Evaluation

The standard evaluation metrics consisting of Precision (the percentage of responses the system returns that are correct), Recall (the percentage of correct responses that are returned) and F-score (the harmonic mean of Precision and Recall) are used to evaluate system results at the levels of BEL terms, BEL functions, BEL relationships and the full BEL statements, separately. The function and relationship levels are also partially evaluated in what is referred to as the Secondary mode. Since the evaluation web interface is provided at ( http://bio-eval.scai.fraunhofer.de/cgi-bin/General_server.rc ), participants can check correctness of their system predictions. Once BEL statements that a system predicts are submitted, the result is evaluated on each level. An example of an evaluation is described in Figure 3 .

Example of an evaluation taken from the web interface. BEL statements in gold standard and system prediction are shown for the example sentence. The evaluation scores are provided for all levels.
Figure 3.

Example of an evaluation taken from the web interface. BEL statements in gold standard and system prediction are shown for the example sentence. The evaluation scores are provided for all levels.

At the Term level (T), a true positive (TP) is an entity the system identified correctly. It must match precisely, including abundance functions (see Table 1 ) as well as associated namespaces and the corresponding resource identifier to a gold standard entity. Identified entities that do not match with a gold annotation are defined as false positives (FP). Entities annotated in the gold standard datasets which are missed by the system are defined as false negatives (FN). As shown in Figure 3 , for instance, the term ‘p(HGNC:IL12B)’ in gold standard is a FN, since the system predicted ‘p(MGI:IL12b)’ instead, while that prediction is a FP. At the Function level (F), abundance functions and activity functions e.g. ‘deg’, or ‘act’, are evaluated (see Table 2 ). If an activity function plus the correct abundance function in the argument matches, it is considered a TP. At the Secondary Function level (Fs), the main function alone is assessed, ignoring the namespace of the entity. For example, the activity function, ‘act’, is missed by the system in the example evaluation. As a result, the result is a FN at both Function and Secondary Function levels. At the Relationship level (R), the relationship between entities (subject and object) is evaluated. TPs are defined as relationships the system returned where a relationship between a subject and an object is correct. On the other hand, partial matches for relationships are evaluated at the Secondary Relationship level (Rs). Cases of partial relationships include a correct relationship with an incorrect subject and a correct object, a correct relationship with a correct subject and an incorrect object, and an incorrect relationship with a correct subject and a correct object; these are scored as TPs at the Secondary Relationship level. For the overall evaluation, each BEL statement (S) is evaluated if it is correct and complete at the full BEL statement level.

Results for Task 1

We report the official results of our submitted runs on the test dataset in Table 5 . Results are reported for Runs 1–3 in Stage 1 of BEL track Task 1. Each run used a different approach, as follows:

Table 5.

Official results on test data for BEL task 1 in Stage 1

TPFPFNPRF
Run 1 (without coref.)Term641223684.221.334.0
Function Second.315375.05.410.0
Function316375.04.68.6
Relation-Second.54514891.526.841.4
Relation322117060.415.825.1
Statement252117754.412.420.2
Run 2 (with coreference)Term641523681.021.333.8
Function Second.415280.07.113.1
Function326360.04.68.5
Relation-Second.54814887.126.740.9
Relation322417057.115.824.8
Statement252417751.012.419.9
Run 3 (with coreference and extended BEL function)Term641523681.021.333.8
Function Second.515183.38.916.1
Function346342.94.68.2
Relation-Second.54814887.126.740.9
Relation322617055.215.824.6
Statement252617749.012.419.8
TPFPFNPRF
Run 1 (without coref.)Term641223684.221.334.0
Function Second.315375.05.410.0
Function316375.04.68.6
Relation-Second.54514891.526.841.4
Relation322117060.415.825.1
Statement252117754.412.420.2
Run 2 (with coreference)Term641523681.021.333.8
Function Second.415280.07.113.1
Function326360.04.68.5
Relation-Second.54814887.126.740.9
Relation322417057.115.824.8
Statement252417751.012.419.9
Run 3 (with coreference and extended BEL function)Term641523681.021.333.8
Function Second.515183.38.916.1
Function346342.94.68.2
Relation-Second.54814887.126.740.9
Relation322617055.215.824.6
Statement252617749.012.419.8

Run 1, an approach without coreference resolution; Run 2, an approach with coreference resolution; Run 3, a coreference approach with extended BEL function.

Table 5.

Official results on test data for BEL task 1 in Stage 1

TPFPFNPRF
Run 1 (without coref.)Term641223684.221.334.0
Function Second.315375.05.410.0
Function316375.04.68.6
Relation-Second.54514891.526.841.4
Relation322117060.415.825.1
Statement252117754.412.420.2
Run 2 (with coreference)Term641523681.021.333.8
Function Second.415280.07.113.1
Function326360.04.68.5
Relation-Second.54814887.126.740.9
Relation322417057.115.824.8
Statement252417751.012.419.9
Run 3 (with coreference and extended BEL function)Term641523681.021.333.8
Function Second.515183.38.916.1
Function346342.94.68.2
Relation-Second.54814887.126.740.9
Relation322617055.215.824.6
Statement252617749.012.419.8
TPFPFNPRF
Run 1 (without coref.)Term641223684.221.334.0
Function Second.315375.05.410.0
Function316375.04.68.6
Relation-Second.54514891.526.841.4
Relation322117060.415.825.1
Statement252117754.412.420.2
Run 2 (with coreference)Term641523681.021.333.8
Function Second.415280.07.113.1
Function326360.04.68.5
Relation-Second.54814887.126.740.9
Relation322417057.115.824.8
Statement252417751.012.419.9
Run 3 (with coreference and extended BEL function)Term641523681.021.333.8
Function Second.515183.38.916.1
Function346342.94.68.2
Relation-Second.54814887.126.740.9
Relation322617055.215.824.6
Statement252617749.012.419.8

Run 1, an approach without coreference resolution; Run 2, an approach with coreference resolution; Run 3, a coreference approach with extended BEL function.

Run 1 consists of the basic TEES + BEL mapping system, with no coreference resolution step;

Run 2 uses the complete pipeline, including coreference resolution;

Run 3 extends the system in complete pipeline of Run 2 with an additional BEL function, act (), as described in Table 4 .

Our system achieved an F-Score of 20.2, with Precision 54.4 and Recall 12.4 at the full Statement level in Run 1. Incorporating coreference resolution (Run 2) increased system performance of F-score from 10.0 to 13.1 at the secondary function level comparing to Run 1, but slightly decreased performance at other levels. This is because the number of coreference mentions is small in the test dataset as described further in ‘Comparison of performance with coreference resolution’ section, and the coreference approach produced more FPs than without coreference. Due to the small number of coreference mentions in the test dataset, we use the training dataset as a more rigorous evaluation of system performance with and without the coreference resolution component. These evaluation results are presented in ‘Comparison of performance with coreference resolution’ section. Note that since our method does not use this data in any way for supervision, this is a valid evaluation strategy.

In a second test phase (Stage 2), gold standard entities for the test dataset were given by the BioCreative BEL task organisers in order to allow the analysis to focus on the task of event extraction, rather than the task of named entity recognition. This is therefore an “oracle” scenario, where the event extraction step is seeded with perfect information about entities. We provided the gold standard entities as input to the extended system corresponding to Run 3 in Stage 1, and the results are described in Table 6 . When compared with Run 3 in Stage 1, the use of gold standard entities resulted in substantially improved system performance, with an absolute increase of F-score (33.8 vs. 54.3), (8.2 vs. 20.8), (24.6 vs. 43.7) and (19.8 vs. 35.2) at the Term, Function, Relation and Statement levels, respectively.

Table 6.

Results on test data for BEL task 1 in the Stage 2

TPFPFNPRF
*NonCoreferenceTerm101519995.333.749.8
Function Second.824880.014.324.2
Function715987.510.618.9
Relation-Second.84311896.641.658.1
Relation571614578.128.241.5
Statement441815871.021.833.3
CoreferenceTerm113318797.437.754.3
Function Second.944769.216.126.1
Function835872.712.120.8
Relation-Second.91311196.845.161.5
Relation622014075.630.743.7
Statement482315467.623.835.2
TPFPFNPRF
*NonCoreferenceTerm101519995.333.749.8
Function Second.824880.014.324.2
Function715987.510.618.9
Relation-Second.84311896.641.658.1
Relation571614578.128.241.5
Statement441815871.021.833.3
CoreferenceTerm113318797.437.754.3
Function Second.944769.216.126.1
Function835872.712.120.8
Relation-Second.91311196.845.161.5
Relation622014075.630.743.7
Statement482315467.623.835.2

Coreference, a coreference approach with extended BEL function using the given gold standard entities, NonCoreference, an approach without coreference resolution with extended BEL function using the given gold standard entities.

Table 6.

Results on test data for BEL task 1 in the Stage 2

TPFPFNPRF
*NonCoreferenceTerm101519995.333.749.8
Function Second.824880.014.324.2
Function715987.510.618.9
Relation-Second.84311896.641.658.1
Relation571614578.128.241.5
Statement441815871.021.833.3
CoreferenceTerm113318797.437.754.3
Function Second.944769.216.126.1
Function835872.712.120.8
Relation-Second.91311196.845.161.5
Relation622014075.630.743.7
Statement482315467.623.835.2
TPFPFNPRF
*NonCoreferenceTerm101519995.333.749.8
Function Second.824880.014.324.2
Function715987.510.618.9
Relation-Second.84311896.641.658.1
Relation571614578.128.241.5
Statement441815871.021.833.3
CoreferenceTerm113318797.437.754.3
Function Second.944769.216.126.1
Function835872.712.120.8
Relation-Second.91311196.845.161.5
Relation622014075.630.743.7
Statement482315467.623.835.2

Coreference, a coreference approach with extended BEL function using the given gold standard entities, NonCoreference, an approach without coreference resolution with extended BEL function using the given gold standard entities.

To directly assess the impact of coreference resolution in this scenario, we ran a variant of the system without the coreference module but in the oracle condition (See the NonCoref section of Table 6 ; note that this system was not included in the official results; these results were generated for the test data after the end of the shared task). In contrast to Stage 1, the performance when incorporating coreference resolution is slightly higher than without coreference in Stage 2. The coreference approach produced more outputs overall. This included not only TPs, but also more FNs than the approach without coreference resolution. Overall, there was a slight performance improvement attributable to coreference resolution over the test data in the oracle condition. (NB: The result of *NonCoref. was not submitted to the BEL task, but the evaluation was conducted later as a subsequent experiment using the official test data).

Comparison of performance with coreference resolution

Based on a co-reference analysis framework that classifies coreference mentions by their types, and considers the broader syntactic and semantic characteristics of coreference links ( 46 ), we analysed the gold standard datasets by categorising types of coreference expressions. The analysis of mention types appears in Table 7 . There are 257 personal pronouns (e.g. ‘it, they’), 411 possessive pronouns (e.g. ‘its, their’) and 507 definite noun phrases (e.g. ‘the protein, these genes’) in the training dataset, while only six personal pronouns and five possessive pronouns in the test dataset. Relative pronouns such as ‘which’ and ‘that’ were not addressed in this task, since the coreference system had a negative impact on event identification for these pronouns. This was determined based on an investigation on the training dataset that demonstrated quantitatively that the resolving relative pronouns degraded performance (results not reported in this article).

Table 7.

Statistics of anaphor types in the gold standard dataset at the BioCreative V shared task Track 4 (BEL track)

Anaphor type Training dataset
Test dataset
NumbersSentence prop.NumbersSentence prop.
Relative pronoun131321%1413%
Personal pronoun2574%66%
Possessive pronoun4116%55%
Definite noun phrase5078%0
Total248825
Anaphor type Training dataset
Test dataset
NumbersSentence prop.NumbersSentence prop.
Relative pronoun131321%1413%
Personal pronoun2574%66%
Possessive pronoun4116%55%
Definite noun phrase5078%0
Total248825

Numbers are counts of occurrence of each anaphoric type, and Sentence prop. is the percentage of all sentences that include at least one anaphor of relevant type.

Table 7.

Statistics of anaphor types in the gold standard dataset at the BioCreative V shared task Track 4 (BEL track)

Anaphor type Training dataset
Test dataset
NumbersSentence prop.NumbersSentence prop.
Relative pronoun131321%1413%
Personal pronoun2574%66%
Possessive pronoun4116%55%
Definite noun phrase5078%0
Total248825
Anaphor type Training dataset
Test dataset
NumbersSentence prop.NumbersSentence prop.
Relative pronoun131321%1413%
Personal pronoun2574%66%
Possessive pronoun4116%55%
Definite noun phrase5078%0
Total248825

Numbers are counts of occurrence of each anaphoric type, and Sentence prop. is the percentage of all sentences that include at least one anaphor of relevant type.

We compare our system performance with and without the coreference resolution component on the training dataset in terms of the types of coreference links defined by the analysis framework ( 46 ) in Table 8 which allows for a fine-grained analysis of information extraction impacted by different types of coreference. Since no component in the pipeline makes use of the provided training data for development, but rather was developed independently of the BEL task as described in ‘Methods’ section, we are able to use all training data as test data. Only 709 sentences that contain anaphoric expressions in the training data were used for this evaluation. Performance is reported in terms of anaphor types, and at the levels of Term, Function, Fs, Relation, Rs and Statement using the evaluation interface 1 provided for the BEL track.

Table 8.

Comparison of performance between an approach with coreference resolution and an approach without it on anaphoric sentences in the training dataset, in terms of anaphor types

Without Coreference
With Coreference
TPFPFNPRFTPFPFNPRF
Pers. pronounT34458943.027.633.755436856.144.749.8
Fs244433.34.47.7684042.913.020.0
F245833.33.36.16105437.510.015.8
Rs25225453.231.739.744243564.755.759.9
R546749.86.37.716466325.820.322.7
S248774.02.53.1555748.36.37.2
Poss. pronounT827412552.639.645.21007410757.548.352.5
Fs20127562.521.131.52397271.924.236.2
F132411635.110.115.7171811248.613.220.7
Rs76337469.750.758.789316174.259.365.9
R278112325.018.020.9347911630.122.725.9
S138513713.38.710.5128713812.18.09.6
Def. NPT27224555.137.544.636263658.150.053.7
Fs932275.029.041.91132078.635.548.9
F4103828.69.514.31053266.723.835.1
Rs2652383.953.165.030101975.061.267.4
R10203933.320.425.316263338.132.735.2
S3234611.56.18.07294219.414.316.5
ALLT14114025550.235.641.718814320856.847.551.7
Fs301813962.517.827.7391913067.223.134.4
F183720932.77.912.8323219550.014.122.0
Rs1266014767.746.254.91626511171.459.364.8
R4214623122.315.418.26515120830.123.826.6
S1815525510.46.68.12317125011.98.49.9
Without Coreference
With Coreference
TPFPFNPRFTPFPFNPRF
Pers. pronounT34458943.027.633.755436856.144.749.8
Fs244433.34.47.7684042.913.020.0
F245833.33.36.16105437.510.015.8
Rs25225453.231.739.744243564.755.759.9
R546749.86.37.716466325.820.322.7
S248774.02.53.1555748.36.37.2
Poss. pronounT827412552.639.645.21007410757.548.352.5
Fs20127562.521.131.52397271.924.236.2
F132411635.110.115.7171811248.613.220.7
Rs76337469.750.758.789316174.259.365.9
R278112325.018.020.9347911630.122.725.9
S138513713.38.710.5128713812.18.09.6
Def. NPT27224555.137.544.636263658.150.053.7
Fs932275.029.041.91132078.635.548.9
F4103828.69.514.31053266.723.835.1
Rs2652383.953.165.030101975.061.267.4
R10203933.320.425.316263338.132.735.2
S3234611.56.18.07294219.414.316.5
ALLT14114025550.235.641.718814320856.847.551.7
Fs301813962.517.827.7391913067.223.134.4
F183720932.77.912.8323219550.014.122.0
Rs1266014767.746.254.91626511171.459.364.8
R4214623122.315.418.26515120830.123.826.6
S1815525510.46.68.12317125011.98.49.9

The higher F-score (with vs. without coreference) is indicated in bold.

Table 8.

Comparison of performance between an approach with coreference resolution and an approach without it on anaphoric sentences in the training dataset, in terms of anaphor types

Without Coreference
With Coreference
TPFPFNPRFTPFPFNPRF
Pers. pronounT34458943.027.633.755436856.144.749.8
Fs244433.34.47.7684042.913.020.0
F245833.33.36.16105437.510.015.8
Rs25225453.231.739.744243564.755.759.9
R546749.86.37.716466325.820.322.7
S248774.02.53.1555748.36.37.2
Poss. pronounT827412552.639.645.21007410757.548.352.5
Fs20127562.521.131.52397271.924.236.2
F132411635.110.115.7171811248.613.220.7
Rs76337469.750.758.789316174.259.365.9
R278112325.018.020.9347911630.122.725.9
S138513713.38.710.5128713812.18.09.6
Def. NPT27224555.137.544.636263658.150.053.7
Fs932275.029.041.91132078.635.548.9
F4103828.69.514.31053266.723.835.1
Rs2652383.953.165.030101975.061.267.4
R10203933.320.425.316263338.132.735.2
S3234611.56.18.07294219.414.316.5
ALLT14114025550.235.641.718814320856.847.551.7
Fs301813962.517.827.7391913067.223.134.4
F183720932.77.912.8323219550.014.122.0
Rs1266014767.746.254.91626511171.459.364.8
R4214623122.315.418.26515120830.123.826.6
S1815525510.46.68.12317125011.98.49.9
Without Coreference
With Coreference
TPFPFNPRFTPFPFNPRF
Pers. pronounT34458943.027.633.755436856.144.749.8
Fs244433.34.47.7684042.913.020.0
F245833.33.36.16105437.510.015.8
Rs25225453.231.739.744243564.755.759.9
R546749.86.37.716466325.820.322.7
S248774.02.53.1555748.36.37.2
Poss. pronounT827412552.639.645.21007410757.548.352.5
Fs20127562.521.131.52397271.924.236.2
F132411635.110.115.7171811248.613.220.7
Rs76337469.750.758.789316174.259.365.9
R278112325.018.020.9347911630.122.725.9
S138513713.38.710.5128713812.18.09.6
Def. NPT27224555.137.544.636263658.150.053.7
Fs932275.029.041.91132078.635.548.9
F4103828.69.514.31053266.723.835.1
Rs2652383.953.165.030101975.061.267.4
R10203933.320.425.316263338.132.735.2
S3234611.56.18.07294219.414.316.5
ALLT14114025550.235.641.718814320856.847.551.7
Fs301813962.517.827.7391913067.223.134.4
F183720932.77.912.8323219550.014.122.0
Rs1266014767.746.254.91626511171.459.364.8
R4214623122.315.418.26515120830.123.826.6
S1815525510.46.68.12317125011.98.49.9

The higher F-score (with vs. without coreference) is indicated in bold.

Overall, system performance improves when incorporating coreference resolution. When considering the resolution of personal pronouns, our system improved Precision, Recall and F-score at each level. We observe an absolute increase in Precision from 43.0 to 56.1, in Recall from 27.6 to 44.7 and in F-score from 33.7 to 49.8 at the Term level. The inclusion of coreference resolution for definite noun phrases also resulted in improvement of Precision (28.6 vs. 66.7), Recall (9.5 vs. 23.8) and F-score (14.3 vs. 35.1) at the Function level (Pers., Personal; Poss., Possessive; NP,  Noun Phrase; ALL, Sum of Per. Pronoun; Poss. Pronoun and Def. NP; T, Term level; Fs, Function-Secondary level; F, Function level; Rs, Relation-Secondary level; R, Relation level; S, full Statement level)

Discussion

The task of extraction of biomolecular relationships in the form of BEL statements is highly complex. The task requires identification of entity types, and disambiguation of entities including namespaces and their roles, as well as correct identification of activity status and relationships between entities. Even though there were simplifications made for the shared task, such as restricting namespaces to 6 of the 20 namespaces used in the full BEL specification, an acceptance of orthologous identifiers for HGNC, MGI and EGID namespaces, and a tolerance of simplified statements (e.g. ‘act()’ allowed for ‘kin(), tscript()’ and ‘cat()’), the five participating systems achieved low performance for the full statement level as described in Table 9 . Our system (S3) achieved the best F-score of 20.2%, and system S4 and S5 achieved slightly lower F-score. System S4 achieved much lower F-score of 2.7% at the Function level, which reduced the system precision at the full statement level, even though achieved higher F-score at the Term and Relation levels than our system. System S5 also achieved lower Precision at the full statement level, even though performed the best F-score at the levels of Term, Function and Relation, with our system limited largely by Recall. The system S4 ( 47 ) used different approaches for each subtask, e.g. a hybrid (Conditional Random Fields and dictionary lookup) approach for identification of entities and abundance functions, a rule-based approach for entity normalisation, and a statistical parser for classification of relationships. The system S5 ( 48 ) used existing systems such as PubTator ( 49 ) and BeCAS ( 50 ) for identification of biomedical concepts, a dictionary lookup method for entity normalisation and a rule-based approach for extraction of biological events.

Table 9.

Evaluation results of participating systems for Task 1

Term
Function
Relation
Full statement
SystemPRFPRFPRFPRF
S138.028.332.426.37.611.81.21.51.30.81.00.9
S252.660.356.211.218.213.99.78.49.07.66.47.0
S3 (ours)84.221.334.075.04.68.660.415.825.154.412.420.2
S4 ( 46 ) 64.261.062.612.51.52.739.619.826.431.214.419.7
S5 ( 47 ) 82.059.368.930.734.932.669.438.149.226.413.918.2
Term
Function
Relation
Full statement
SystemPRFPRFPRFPRF
S138.028.332.426.37.611.81.21.51.30.81.00.9
S252.660.356.211.218.213.99.78.49.07.66.47.0
S3 (ours)84.221.334.075.04.68.660.415.825.154.412.420.2
S4 ( 46 ) 64.261.062.612.51.52.739.619.826.431.214.419.7
S5 ( 47 ) 82.059.368.930.734.932.669.438.149.226.413.918.2

The best F-score among their submissions is described for each system; adapted from Fluck et al. ( 25 ).

Table 9.

Evaluation results of participating systems for Task 1

Term
Function
Relation
Full statement
SystemPRFPRFPRFPRF
S138.028.332.426.37.611.81.21.51.30.81.00.9
S252.660.356.211.218.213.99.78.49.07.66.47.0
S3 (ours)84.221.334.075.04.68.660.415.825.154.412.420.2
S4 ( 46 ) 64.261.062.612.51.52.739.619.826.431.214.419.7
S5 ( 47 ) 82.059.368.930.734.932.669.438.149.226.413.918.2
Term
Function
Relation
Full statement
SystemPRFPRFPRFPRF
S138.028.332.426.37.611.81.21.51.30.81.00.9
S252.660.356.211.218.213.99.78.49.07.66.47.0
S3 (ours)84.221.334.075.04.68.660.415.825.154.412.420.2
S4 ( 46 ) 64.261.062.612.51.52.739.619.826.431.214.419.7
S5 ( 47 ) 82.059.368.930.734.932.669.438.149.226.413.918.2

The best F-score among their submissions is described for each system; adapted from Fluck et al. ( 25 ).

When incorporating coreference resolution, system performance on the training and the test datasets differed substantially. The evaluation results on the training dataset show that the coreference resolution approach markedly improved system performance compared with the result without coreference resolution as shown in Table 8 . On the other hand, the approach with coreference resolution slightly reduced system performance on the official test dataset in Stage 1, producing additional FPs (Run 1 and Run 2 in Table 5 ). However, the test data are small and contains few instances of coreference. There are only 11 coreference relations (personal and possessive pronouns only considered) in 105 sentences in the test dataset, as summarised in Table 7 . This small number of coreference mentions in the test data is insufficient to evaluate the impact of coreference resolution. Our system produced four additional BEL statements over the test data with coreference resolution, as compared to the result without coreference resolution. These statements are all FPs due to system errors in normalisation of entity mentions to IDs, and in identification of events involving entity types other than proteins and genes. We discuss the impact of coreference resolution on event extraction further in ‘impact of coreference resolution’ section.

Error analysis

The BEL task requires identification of a range of entity types including genes, diseases, chemicals and biological processes in the input texts, as described in Table 1 . However, our system is limited to identifying events involving gene and protein entities only, due to the reliance on BANNER and its gene model for entity recognition. There are 57 diseases, chemical and biological process entities among 295 entities in the test dataset described in Table 1 . Given the limitations of the system, these entities were ignored; no BEL statements involving them could be identified.

There is a notable difference in the results between Stages 1 and 2, the oracle condition. With gold standard entities provided, our system substantially improved overall performance in Stage 2 ( Table 6 ), indicating that improved entity detection would greatly benefit our system. We will expand the range of entity types and address relations involving these entities in future work. For instance, we may be able to build on the work of Funk et al. ( 51 ) to address identification of Gene Ontology and ChEBI terms and DNorm for Diseases ( 52 ).

There was also a limitation in the performance of our system stemming from which trigger mentions are used to produce BEL statements in the original BEL generation system that we employed ( 22 ). Low Recall at the Function and Function-Secondary Levels in Table 5 shows that our system failed to capture event trigger mentions associated with many BEL functions. When the original BEL generation system was developed, the trigger mentions were derived from the BioNLP’ST 09 corpus ( 12 ). As a subsequent experiment, we extended a set of trigger mentions by taking advantage of the BioNLP’ST 2011 and 2013 gold standard corpora ( 13 , 14 ). However, this extension did not result in an improvement in performance, and the results of this further experiment are not presented in this paper. In future work, we will consider other methods to better address this issue.

Impact of coreference resolution

Even though the process of coreference resolution resulted in a slight performance reduction in the final result on the test dataset, the approach has the potential to improve discovery of implied and complex biological events, as indicated by our experiments over the training data. For example, the following passage expresses a relationship between the personal pronoun ‘ It and the gene TIMP-1 in the text.

‘Interestingly, IL-13 did cause an ∼80% decrease in pulmonary a1-AT expression (Figure 13). It also caused a significant increase in TIMP-1 expression that was seen after as little as 1 day and was readily apparent with longer periods of dox administration (Figure 13, and data not shown) ( P < 0.05 for all comparisons)’. (SEN:10028008)

Our system identifies the coreference relationship between the anaphor ‘It’ and the gene IL-13 (antecedent) mentioned in the previous sentence, and automatically substitutes the pronoun with its antecedent. Consequently, the event, ‘IL13 increases TIMP1 expression’ is successfully extracted. This would not be identified without coreference resolution. In the results described in Table 8 , our system including coreference resolution produced more TPs overall, e.g. 188 vs. 141 at the Term level, 32 vs. 18 at the Function level, 65 vs. 42 at the Relation level and 23 vs. 18 at the full Statement level.

We also compare the approaches with and without coreference resolution on the training dataset using a statistical significance test (paired t -test) in Table 10 . Differences between the approaches at each evaluation level are significant at the 95% confidence interval (*Note: With Coref. performs better than Without Coref., when t-score is under [1.699, ∞), while Without Coref. performs better, when t-score is under (−∞, −1.699]. Otherwise, there is no significant difference between With Coref. and Without Coref.).

Table 10.

Results of paired t-test between an approach with coreference resolution and an approach without it on the training dataset for each level

TermFunction_S.FunctionRelation_S.RelationStatement
With coreference without coreferencet6.824.775.205.515.795.34
TermFunction_S.FunctionRelation_S.RelationStatement
With coreference without coreferencet6.824.775.205.515.795.34

At the 95% confidence interval (df = 29), a score of ± 1.699 indicates a significance difference; all reported differences are significant.

Table 10.

Results of paired t-test between an approach with coreference resolution and an approach without it on the training dataset for each level

TermFunction_S.FunctionRelation_S.RelationStatement
With coreference without coreferencet6.824.775.205.515.795.34
TermFunction_S.FunctionRelation_S.RelationStatement
With coreference without coreferencet6.824.775.205.515.795.34

At the 95% confidence interval (df = 29), a score of ± 1.699 indicates a significance difference; all reported differences are significant.

Conclusions

To address the BEL task in the BioCreative V, we have developed a system for biological event extraction, targeting generation of BEL statements from the biomedical literature, by incorporating several existing text mining systems. In this task, we have also explored how a coreference resolution component can help to improve event extraction. Even though our performance on the official test data did not show a strong benefit from the incorporation of coreference resolution due to a small number of coreference instances in that data, we have demonstrated that over a larger data set, coreference resolution does significantly improve overall event extraction performance. The coreference resolution approach has the potential to discover implied relationships among entities, and thus impact event and network extraction in the biomedical domain.

The BEL task makes use of six possible namespaces for various biological entity types. However, our system is limited to identifying events involving specifically proteins and genes only and did not emphasise entity normalisation as a primary task. We report a substantial improvement in system performance using the given gold standard entities in the oracle setting of BEL Task 1, Stage 2. In future work, we will further expand the scope of named entity recognition to extract events involving other relevant biological concepts and entities, in order to achieve further improvement in our overall information extraction capability.

Acknowledgements

This work was supported by the University of Melbourne, and by the Australian Federal and Victorian State governments and the Australian Research Council through the ICT Centre of Excellence program, National ICT Australia (NICTA). The project receives funding from the Australian Research Council through a Discovery Project grant, DP150101550.

Conflict of interest . None declared.

References

1

Ananiadou
S.
Pyysalo
S.
Tsujii
J.I.
et al.  . (
2010
)
Event extraction for systems biology by text mining the literature
.
Trends Biotechnol
.,
28
,
381
390
.

2

Rebholz-Schuhmann
D.
Oellrich
A.
Hoehndorf
R.
(
2012
)
Text-mining solutions for biomedical research: enabling integrative biology
.
Nat. Rev. Genet
.,
13
,
829
839
.

3

Gonzalez
G.H.
Tahsin
T.
Goodale
B.C.
et al.  . (
2015
)
Recent advances and emerging applications in text and data mining for biomedical discovery
.
Brief. Bioinform
.,
17
,
33
42
.

4

Demir
E.
Cary
M.P.
Paley
S
. et al.  . (
2010
)
The BioPAX community standard for pathway data sharing
.
Nat. Biotechnol
.,
28
,
935
942
.

5

Hucka
M.
Finney
A.
Sauro
H.M
. et al.  . (
2003
)
The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models
.
Bioinformatics
,
19
,
524
531
.

6

Slater
T.
Song
D.
(
2012
)
Saved by the BEL: ringing in a common language for the life sciences
.
Drug Discov. World Fall
, .

7

Matthews
L.
Gopinath
G.
Gillespie
M
. et al.  . (
2009
)
Reactome knowledgebase of human biological pathways and processes
.
Nucleic Acids Res
.,
37
,
D619
D622
.

8

Le Novere
N.
Bornstein
B.
Broicher
A
. et al.  . (
2006
)
BioModels Database: a free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems
.
Nucleic Acids Res
.,
34
,
D689
D691
.

9

Oda
K.
Matsuoka
Y.
Funahashi
A.
Kitano
H.
et al.  . (
2005
)
A comprehensive pathway map of epidermal growth factor receptor signaling
.
Molecular systems biology
,
1
,
8
24
.

10

Fluck
J.
Madan
S.
Ellendorff
T.R
. et al.  . (
2015
) Track 4 Overview: Extraction of Causal Network Information in Biological Expression Language (BEL). Proceedings of the fifth BioCreative challenge evaluation workshop , Sevilla, Spain.

11

Fluck
J.
Madan
S.
Ansari
S
. et al.  . (
2016
)
Training and evaluation corpora for the extraction of causal relationships encoded in Biological Expression Language (BEL)
.
Database
. Submitted

12

Kim
J.D.
Ohta
T.
Pyysalo
S.
et al.  . (
2009
) Overview of BioNLP’09 shared task on event extraction. Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task . Association for Computational Linguistics, pp.
1
9
.

13

Kim
J.D.
Wang
Y.
Takagi
T.
et al.  . (
2011
) Overview of genia event task in bionlp shared task 2011. Proceedings of the BioNLP Shared Task 2011 Workshop . Association for Computational Linguistics, pp.
7
15
.

14

Kim
J.D.
Wang
Y.
Yasunori
Y.
(
2013
) The genia event extraction shared task, 2013 edition-overview. Proceedings of the BioNLP Shared Task 2013 Workshop . Association for Computational Linguistics, pp.
8
15
.

15

Krallinger
M.
Leitner
F.
Rodriguez-Penagos
C.
et al.  . (
2008
)
Overview of the protein-protein interaction annotation extraction task of BioCreative II
.
Genome Biol
.,
9
,
S4.

16

Krallinger
M.
Vazquez
M.
Leitner
F
. et al.  . (
2011
)
The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text
.
BMC Bioinformatics
,
12
,
S3.

17

Li
C.
Liakata
M.
Rebholz-Schuhmann
D.
(
2014
)
Biological network extraction from scientific literature: state of the art and challenges
.
Brief. Bioinformatics
,
15
,
856
877
.

18

Kim
J.D.
Ohta
T.
Pyysalo
S.
et al.  . (
2011
)
Extracting bio-molecular events from literature - the BioNLP'09 shared task
.
Comput. Intel
.,
27
,
513
540
.

19

Miwa
M.
Sætre
R.
Kim
J.D.
et al.  . (
2010
)
Event extraction with complex event classification using rich features
.
J. Bioinformatics Comput. Biol
.,
8
,
131
146
.

20

Nguyen
N.
Kim
J.D.
Tsujii
J.I
. (
2011
) Overview of the protein coreference task in BioNLP shared task 2011. Proceedings of the BioNLP Shared Task 2011 Workshop. Association for Computational Linguistics, pp.
74
82
.

21

Björne
J.
Salakoski
T.
(
2011
) Generalizing biomedical event extraction. Proceedings of the BioNLP Shared Task 2011 Workshop . Association for Computational Linguistics, pp.
183
191
.

22

Liu
H.
Baumgartner
W.
Jr
Catlett
N
. et al.  . (
2013
)
Automatic generation of BEL statements from text-mined biological events
.
BioLINK SIG
, p.
58
.

23

Choi
M.
Zobel
J.
Verspoor
K.
(
2016
)
A categorical analysis of coreference resolution errors in biomedical texts
.
J. Biomed. Inform
.,
60
,
309
318
.

24

Choi
M.
Liu
H.
Baumgartner
W.
et al.  . (
2015
) Integrating coreference resolution for BEL statement generation. Proceedings of the fifth BioCreative challenge evaluatio workshop. Sevilla, Spain .

25

Rinaldi
F.
Ellendorff
T.R.
Madan
S
. et al.  . (
2016
)
BioCreative V Track 4: a shared task for the extraction of causal network information in biological expression language
.
Database
.

26

Nédellec
C.
Bossy
R.
Kim
J.-D.
et al.  . (
2013
) Overview of BioNLP shared task 2013. Proceedings of the BioNLP Shared Task 2013 Workshop, pp.
1
7
.

27

Kim
J.D.
Pyysalo
S.
Ohta
T.
et al.  . (
2011
) Overview of BioNLP shared task 2011. Proceedings of the BioNLP Shared Task 2011 Workshop . Association for Computational Linguistics, pp.
1
6
.

28

Björne
J.
Heimonen
J.
Ginter
F.
et al.  . (
2009
) Extracting complex biological events with rich graph-based feature sets. Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task. Association for Computational Linguistics, pp.
10
18
.

29

Van Landeghem
S.
Björne
J.
Wei
C.H
. et al.  . (
2013
)
Large-scale event extraction from literature with multi-level gene normalization
.
PLoS One
,
8
,
e55814
.

30

Baumgartner
W.A.
Cohen
K.B.
Fox
L.M.
et al.  . (
2007
)
Manual curation is not sufficient for annotation of genomic databases
.
Bioinformatics
,
23
,
i41
i48
.

31

Kim
J.D.
Nguyen
N.
Wang
Y.
et al.  . (
2012
)
The genia event and protein coreference tasks of the BioNLP shared task 2011
.
BMC Bioinformatics
,
13
,
S1.

32

Kim
Y.
Riloff
E.
Gilbert
N.
(
2011
) The taming of reconcile as a biomedical coreference resolver. Proceedings of the BioNLP Shared Task 2011 Workshop . Association for Computational Linguistics, pp.
89
93
.

33

Stoyanov
V.
Cardie
C.
Gilbert
N.
et al.  . (
2010
) Coreference resolution with reconcile. Proceedings of the ACL 2010 Conference Short Papers . Association for Computational Linguistics, pp.
156
161
.

34

Miwa
M.
Thompson
P.
Ananiadou
S.
(
2012
)
Boosting automatic event extraction from the literature using domain adaptation and coreference resolution
.
Bioinformatics
,
28
,
1759
1765
.

35

Li
L.
Jin
L.
Jiang
Z.
et al.  . (
2014
) Coreference resolution in biomedical texts. Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on. IEEE, pp.
12
14
.

36

D’Souza
J.
Ng
V.
(
2012
) Anaphora resolution in biomedical literature: a hybrid approach. Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine . ACM, pp.
113
122
.

37

Choi
M.
Verspoor
K.
Zobel
J.
(
2014
) Evaluation of coreference resolution for biomedical text. Proceedings of the SIGIR workshop on Medical Information Retrieval (MEDIR 2014)
9
11
.

38

Manning
C.D.
Surdeanu
M.
Bauer
J.
et al.  . (
2014
) The Stanford CoreNLP natural language processing toolkit. Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations , pp.
55
60
,
Baltimore, Maryland USA, June 23-24, 2014. Association for Computational Linguistics
.

39

Chen
D.
Manning
C.D.
(
2014
) A fast and accurate dependency parser using neural networks. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , vol.
1
, pp.
740
750
.

40

Leaman
R.
Gonzalez
G.
(
2008
)
BANNER: an executable survey of advances in biomedical named entity recognition
.
Pac. Symp. Biocomput
.,
13
,
652
663
.

41

Lee
H.
Chang
A.
Peirsman
Y.
et al.  . (
2013
)
Deterministic coreference resolution based on entity-centric, precision-ranked rules
.
Comput. Linguist
.,
39
,
885
916
.

42

Nguyen
N.
Kim
J.D.
Miwa
M.
et al.  . (
2012
)
Improving protein coreference resolution by simple semantic classification
.
BMC Bioinformatics
,
13
,
304.

43

Sætre
R.
Yoshida
K.
Yakushiji
A.
et al.  . (
2007
)
AKANE system: protein-protein interaction pairs in BioCreAtIvE2 challenge, PPI-IPS subtask. Proceedings of the Second BioCreative Challenge Workshop, Edited by Hirschman L, Krallinger M, Valencia A, Spain: CNIO 2007:209212.
pp.
209
212
.

44

Charniak
E.
Johnson
M.
(
2005
) Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics . Association for Computational Linguistics, pp.
173
180
,
Ann Arbor, Michigan, 2005. Association for Computational Linguistics . doi:10.3115/1219840.1219862.

45

Natale
D.A.
Arighi
C.N.
Barker
W.C
. et al.  . (
2011
)
The Protein Ontology: a structured representation of protein forms and complexes
.
Nucleic Acids Res
.,
39
,
D539
D545
.

46

Choi
M.
Verspoor
K.
Zobel
J.
Analysis of coreference relations in the biomedical literature. Australasian Language Technology Association Workshop 2014 , pp.
134
.

47

Lai
P.T.
Lo
Y.Y.
Huang
M.S.
et al.  . (
2015
) NCU-IISR system for BioCreative BEL task. Proceedings of the fifth BioCreative challenge evaluation workshop , Sevilla, Spain.

48

Elayavilli
R.K.
Rastegar-Mojarad
M.
Liu
H.
(
2015
) Adapting a rule-based relation extraction system for BioCreative V BEL task. Proceedings of the fifth BioCreative challenge evaluation workshop. Sevilla, Spain.

49

Wei
C.H.
Kao
H.Y.
Lu
Z.
(
2013
)
PubTator: a web-based text mining tool for assisting biocuration
.
Nucleic Acids Res
.,
41
,
W518
W522
.

50

Nunes
T.
Campos
D.
Matos
S.
et al.  . (
2013
)
BeCAS: biomedical concept recognition services and visualization
.
Bioinformatics
,
29
,
1915
1916
.

51

Funk
C.
Baumgartner
W.
Garcia
B
. et al.  . (
2014
)
Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters
.
BMC Bioinformatics
,
15
,
59.

52

Leaman
R.
Doğan
R.I.
Lu
Z.
(
2013
)
DNorm: disease name normalization with pairwise learning to rank
.
Bioinformatics
,
29
,
2909
2917
.

Author notes

Citation details: Choi,M., Liu,H., Baumgartner,W. et al. Coreference resolution improves extraction of Biological Expression Language statements from texts. Database (2016) Vol. 2016: article ID baw076; doi:10.1093/database/baw076

This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.