A survey of ontology learning techniques and applications Open Access

Performance Summary of Ontology Learning Techniques

Techniques		Domain	Performance	References
				Paper	Tools
Linguistic Techniques
Preprocessing	Berkley Parser	Tourism, Sport	Precision=95.7%	(28)	Text2Onto(75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), CRCTOL (28), https://nlp.stanford.edu/software/lex-parser.shtml, http://nlp.cs.berkeley.edu/
	Stanford Parser		Precision=90.3%
	Syntactic Analysis for headword modifier	Chinese Text	Accuracy=83.3%	(29)	https://github.com/kimduho/nlp/wiki/Head-modifier-principle-(or-relation)
Relation Extraction	Lexico-syntactic Parsing	News	Accuracy=75.5%	(40)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), CRCTOL (28), ASIUM (117, 118, 119) (http://www-ai.ijs.si/∼ilpnet2/systems/asium.html), TextStorm/Clouds (27, 123)
Relation Extraction	Dependency Analysis	Bioinformatics	Accuracy=83.3%	(38)
Statistical Techniques
Term Extraction	C/NC Value	Medical	Precision=89.7%	(26)	OntoGain (72), https://github.com/Neuw84/CValue-TermExtraction
			Computer Science	Precision=86.67%
	Contrastive Analysis	Chinese Text	Precision=70%	(56)	OntoLearn (49, 124, 55, 125), CRCTOL (28), OntoGain (72)
	Co-occurrence Analysis	Biomedical (Cancer)	Precision=67.3%	(62)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), https://github.com/gsi-upm/sematch
	Clustering	Tourism	Accuracy=68.52%	(66)	ASIUM (117, 118, 119) (http://www-ai.ijs.si/∼ilpnet2/systems/asium.html), Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), https://pythonprogramminglanguage.com/kmeans-text-clustering/
		Tourism	Accuracy=53.2%
Relation Extraction	Formal Concept Analysis	Medical	Precision=47%	(72)	OntoGain (72), https://github.com/xflr6/concepts
		Computer Science	Precision=44%	(72)	OntoGain (72), https://github.com/xflr6/concepts
	Hierarchical Clustering	Medical	Precision=71%	(72)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), https://github.com/mstrosaker/hclust
		Cooking	Precision=92.1%	(71)
		Finance	F1 Score=18.51%	(75)
		Tourism	F1 Score=21.4%	(75)
	Association Rule Mining	Medical	Accuracy=72.5%	(72)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html)
Logical
	Inductive Logical Programming	English	Accuracy=96%	(83)	TextStorm/Clouds (27, 123) , Syndikate (126, 11), http://pyke.sourceforge.net/

Techniques		Domain	Performance	References
				Paper	Tools
Linguistic Techniques
Preprocessing	Berkley Parser	Tourism, Sport	Precision=95.7%	(28)	Text2Onto(75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), CRCTOL (28), https://nlp.stanford.edu/software/lex-parser.shtml, http://nlp.cs.berkeley.edu/
	Stanford Parser		Precision=90.3%
	Syntactic Analysis for headword modifier	Chinese Text	Accuracy=83.3%	(29)	https://github.com/kimduho/nlp/wiki/Head-modifier-principle-(or-relation)
Relation Extraction	Lexico-syntactic Parsing	News	Accuracy=75.5%	(40)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), CRCTOL (28), ASIUM (117, 118, 119) (http://www-ai.ijs.si/∼ilpnet2/systems/asium.html), TextStorm/Clouds (27, 123)
Relation Extraction	Dependency Analysis	Bioinformatics	Accuracy=83.3%	(38)
Statistical Techniques
Term Extraction	C/NC Value	Medical	Precision=89.7%	(26)	OntoGain (72), https://github.com/Neuw84/CValue-TermExtraction
			Computer Science	Precision=86.67%
	Contrastive Analysis	Chinese Text	Precision=70%	(56)	OntoLearn (49, 124, 55, 125), CRCTOL (28), OntoGain (72)
	Co-occurrence Analysis	Biomedical (Cancer)	Precision=67.3%	(62)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), https://github.com/gsi-upm/sematch
	Clustering	Tourism	Accuracy=68.52%	(66)	ASIUM (117, 118, 119) (http://www-ai.ijs.si/∼ilpnet2/systems/asium.html), Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), https://pythonprogramminglanguage.com/kmeans-text-clustering/
		Tourism	Accuracy=53.2%
Relation Extraction	Formal Concept Analysis	Medical	Precision=47%	(72)	OntoGain (72), https://github.com/xflr6/concepts
		Computer Science	Precision=44%	(72)	OntoGain (72), https://github.com/xflr6/concepts
	Hierarchical Clustering	Medical	Precision=71%	(72)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), https://github.com/mstrosaker/hclust
		Cooking	Precision=92.1%	(71)
		Finance	F1 Score=18.51%	(75)
		Tourism	F1 Score=21.4%	(75)
	Association Rule Mining	Medical	Accuracy=72.5%	(72)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html)
Logical
	Inductive Logical Programming	English	Accuracy=96%	(83)	TextStorm/Clouds (27, 123) , Syndikate (126, 11), http://pyke.sourceforge.net/

Table 1

Performance Summary of Ontology Learning Techniques

Techniques		Domain	Performance	References
				Paper	Tools
Linguistic Techniques
Preprocessing	Berkley Parser	Tourism, Sport	Precision=95.7%	(28)	Text2Onto(75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), CRCTOL (28), https://nlp.stanford.edu/software/lex-parser.shtml, http://nlp.cs.berkeley.edu/
	Stanford Parser		Precision=90.3%
	Syntactic Analysis for headword modifier	Chinese Text	Accuracy=83.3%	(29)	https://github.com/kimduho/nlp/wiki/Head-modifier-principle-(or-relation)
Relation Extraction	Lexico-syntactic Parsing	News	Accuracy=75.5%	(40)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), CRCTOL (28), ASIUM (117, 118, 119) (http://www-ai.ijs.si/∼ilpnet2/systems/asium.html), TextStorm/Clouds (27, 123)
Relation Extraction	Dependency Analysis	Bioinformatics	Accuracy=83.3%	(38)
Statistical Techniques
Term Extraction	C/NC Value	Medical	Precision=89.7%	(26)	OntoGain (72), https://github.com/Neuw84/CValue-TermExtraction
			Computer Science	Precision=86.67%
	Contrastive Analysis	Chinese Text	Precision=70%	(56)	OntoLearn (49, 124, 55, 125), CRCTOL (28), OntoGain (72)
	Co-occurrence Analysis	Biomedical (Cancer)	Precision=67.3%	(62)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), https://github.com/gsi-upm/sematch
	Clustering	Tourism	Accuracy=68.52%	(66)	ASIUM (117, 118, 119) (http://www-ai.ijs.si/∼ilpnet2/systems/asium.html), Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), https://pythonprogramminglanguage.com/kmeans-text-clustering/
		Tourism	Accuracy=53.2%
Relation Extraction	Formal Concept Analysis	Medical	Precision=47%	(72)	OntoGain (72), https://github.com/xflr6/concepts
		Computer Science	Precision=44%	(72)	OntoGain (72), https://github.com/xflr6/concepts
	Hierarchical Clustering	Medical	Precision=71%	(72)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), https://github.com/mstrosaker/hclust
		Cooking	Precision=92.1%	(71)
		Finance	F1 Score=18.51%	(75)
		Tourism	F1 Score=21.4%	(75)
	Association Rule Mining	Medical	Accuracy=72.5%	(72)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html)
Logical
	Inductive Logical Programming	English	Accuracy=96%	(83)	TextStorm/Clouds (27, 123) , Syndikate (126, 11), http://pyke.sourceforge.net/

Techniques		Domain	Performance	References
				Paper	Tools
Linguistic Techniques
Preprocessing	Berkley Parser	Tourism, Sport	Precision=95.7%	(28)	Text2Onto(75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), CRCTOL (28), https://nlp.stanford.edu/software/lex-parser.shtml, http://nlp.cs.berkeley.edu/
	Stanford Parser		Precision=90.3%
	Syntactic Analysis for headword modifier	Chinese Text	Accuracy=83.3%	(29)	https://github.com/kimduho/nlp/wiki/Head-modifier-principle-(or-relation)
Relation Extraction	Lexico-syntactic Parsing	News	Accuracy=75.5%	(40)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), CRCTOL (28), ASIUM (117, 118, 119) (http://www-ai.ijs.si/∼ilpnet2/systems/asium.html), TextStorm/Clouds (27, 123)
Relation Extraction	Dependency Analysis	Bioinformatics	Accuracy=83.3%	(38)
Statistical Techniques
Term Extraction	C/NC Value	Medical	Precision=89.7%	(26)	OntoGain (72), https://github.com/Neuw84/CValue-TermExtraction
			Computer Science	Precision=86.67%
	Contrastive Analysis	Chinese Text	Precision=70%	(56)	OntoLearn (49, 124, 55, 125), CRCTOL (28), OntoGain (72)
	Co-occurrence Analysis	Biomedical (Cancer)	Precision=67.3%	(62)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), https://github.com/gsi-upm/sematch
	Clustering	Tourism	Accuracy=68.52%	(66)	ASIUM (117, 118, 119) (http://www-ai.ijs.si/∼ilpnet2/systems/asium.html), Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), https://pythonprogramminglanguage.com/kmeans-text-clustering/
		Tourism	Accuracy=53.2%
Relation Extraction	Formal Concept Analysis	Medical	Precision=47%	(72)	OntoGain (72), https://github.com/xflr6/concepts
		Computer Science	Precision=44%	(72)	OntoGain (72), https://github.com/xflr6/concepts
	Hierarchical Clustering	Medical	Precision=71%	(72)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), https://github.com/mstrosaker/hclust
		Cooking	Precision=92.1%	(71)
		Finance	F1 Score=18.51%	(75)
		Tourism	F1 Score=21.4%	(75)
	Association Rule Mining	Medical	Accuracy=72.5%	(72)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html)
Logical
	Inductive Logical Programming	English	Accuracy=96%	(83)	TextStorm/Clouds (27, 123) , Syndikate (126, 11), http://pyke.sourceforge.net/

In addition, we also cite the tools (column: Tools) and reference papers (column: Paper) against each performance benchmark produced by specific underlying ontology learning technique in different domains. Table 1 can prove a milestone for researchers and practitioners as it marks seven most prominent and widely used ontology learning tools with their respective methodology. Among all, Text2Onto, ASIUM and CRCTOL are considered hybrid ontology learning tools as they exploit both linguistic and statistical techniques in order to extract terms and relations from underlying corpus. Whereas OntoGain and OntoLearn solely utilize statistical-based methods in order to perform any ontology learning task. Similarly, TextStorm/Clouds and Syndikate use only logical techniques to acquire concepts and relations.

Evaluation of ontology learning techniques

Assessing the quality of ontology acquisition is a very important aspect of smart web technology as it allows the researchers and practitioners to assess the correctness at lexical level, coverage at concept level, wellness at taxonomic level and adequacy at non-taxonomic level of yielded ontologies. Evaluation of ontology acquisition makes it possible to refine and remodel the entire ontology learning process in case of unexpected resultant ontologies, which do not fit with the specific requirements of a user. As discussed earlier, ontology learning is a multi-level process so this makes the evaluation process of ontology extraction pretty hard. Considering the complexity of evaluating domain ontologies, countless evaluation techniques have been proposed in the past couple of years and this area is still under continuous development. All proposed techniques fall under one of these categories, which are generally classified on the basis of kind of target ontologies and purpose of evaluation.

Golden standard-based evaluation
Application-based evaluation
Data-driven evaluation
Human evaluation

Table 2 gives an overview of ontology evaluation approaches against various supported evaluation levels of ontology learning.

Table 2

Overview of ontology evaluation approaches

Level	Golden standard	Application-based	Data-driven	Assessment by humans
Lexical, vocabulary, concept and data	x	x	x	x
Hierarchy and taxonomy	x	x	x	x
Other semantic relations	x	x	x	x
Context and application		x		x
Syntactic	x			x
Structure, architecture and design				x

Level	Golden standard	Application-based	Data-driven	Assessment by humans
Lexical, vocabulary, concept and data	x	x	x	x
Hierarchy and taxonomy	x	x	x	x
Other semantic relations	x	x	x	x
Context and application		x		x
Syntactic	x			x
Structure, architecture and design				x

Table 2

Overview of ontology evaluation approaches

Level	Golden standard	Application-based	Data-driven	Assessment by humans
Lexical, vocabulary, concept and data	x	x	x	x
Hierarchy and taxonomy	x	x	x	x
Other semantic relations	x	x	x	x
Context and application		x		x
Syntactic	x			x
Structure, architecture and design				x

Level	Golden standard	Application-based	Data-driven	Assessment by humans
Lexical, vocabulary, concept and data	x	x	x	x
Hierarchy and taxonomy	x	x	x	x
Other semantic relations	x	x	x	x
Context and application		x		x
Syntactic	x			x
Structure, architecture and design				x

This section highlights the research work done by many researchers and practitioners utilizing one of the mentioned evaluation techniques along with advantages, challenges and drawbacks.

Golden standard-based evaluation

Golden standard-based evaluation is all about evaluating resultant ontology with a predefined benchmark or standard ontology. As gold standard ontology depicts an ideal ontology of a particular domain, assessing and comparing the learned ontology through this reference ontology can efficiently validate domain coverage and consistency. Golden standard can be a stand-alone ontology, statistical figures fetched from corpus or formalized by domain experts. Golden standard-based techniques are also known as ontology mapping or ontology alignment. All measures that come under the category of golden standard-based evaluation enable frequent and large-scale evaluations at multi-level. However, having an appropriate gold ontology may prove a huge challenge, since it needs to be the one that has been created with similar conditions and goals as suggested by the learned ontology. This leads to select either human-created taxonomies or reliable taxonomies of a similar domain as gold standard by most of the approaches. It is important to mention that all gold standard techniques mostly cover completeness, conciseness and accuracy factors for evaluation of learned ontologies.

Maedche and Staab (60) propose a set of similarity measures for ontology and empirical evaluation for different phases of ontology learning. They take ontologies as two-layer architecture comprising of lexical and conceptual layer. Considering this ontology model, they compute similarity between learned ontology and reference ontology, which is prepared by experts in tourism domain. They measure the similarity of ontologies on the basis of lexicon, semantic cotopy and reference functions. Moreover, Ponzetto and Strube (88) extracted a taxonomy from Wikipedia and compared it with a couple of gold standard taxonomies. At first, this technique utilizes a denotational mapper known as ‘lexeme-to-concept’ to map the extracted ontology. Finally, semantic similarity is computed through WordNet using various measures: Leacock and Chodorow (89,), Zavitsanos et al. (90,), Trokanas et al. (91,) and Sfar et al. (92,) assess the learned ontology by comparing it with a gold standard ontology. The proposed approach computes the similarity of two ontologies at lexical and relational level by transforming the ontological concepts and their attributes into vector representation. Likewise, Kashyap et al. (93) also exploited the similar approach by considering MEDLINE as corpus and MeSH thesaurus as benchmark to assess their extracted taxonomy. The assessment process actually compares the constructed taxonomy with the benchmark taxonomy using the following couple of metrics:

Content quality: It computes the extent of overlap among the labels of both taxonomies for sake of measuring precision and recall.
Structural quality: It computes the structural validity of all labels. For instance, if two labels are appearing in an ancestor–descendant relationship in first taxonomy then they must possess the same parent child relationship in other taxonomy.

Treeratpituk et al. (94) constructed a taxonomy from a corpus of larger text. They compared the constructed taxonomy with the six benchmark taxonomies. These taxonomies are topic specific and extracted from Wikipedia by exploiting their suggested GraBTax algorithm.

Application-based evaluation

Application-based evaluation also referred as ‘Task Based Evaluation’ is an application and task-oriented evaluation as it evaluates given ontology by exploiting it in a specific application to perform some task. The outcome of particular task determines the goodness of specified ontology regardless of its structural properties. Task-based methodologies enable the detection of inconsistent concepts and allow to evaluate the adaptability of particular ontology by analyzing the performance of the specified ontology in the context of various tasks (95). In addition, task-based approaches are mostly getting exploited in the process of evaluating compatibility among employed tool and the ontology and measuring the required pace to complete the particular task. Application-based evaluation evaluates the correctness, coverage, adequacy and wellness of ontology in reference to other applications. For instance, an ontology is crafted in quest of improving the results of document retrieval. One may accumulate some sample queries to check if application retrieved more relevant documents after utilizing crafted ontology. In addition, it is important to mention that task-based evaluation measures mainly depend on the kind of task. In the case of document retrieval, traditional measures of information retrieval such as F-score can be used (96, 97). Lozano-Tello et al. (98) proposed a technique that enables the users to determine the suitability and appropriateness of existing ontologies with the requirements of their respective systems. Porzel and Malaka et al. (99) evaluated the exploitation of ontological relations in speech recognition. Human-generated gold standard is used to compare the outcome of the speech recognition system. It is important to mention that application-based evaluation has several shortcomings, which are highlighted as below:

Ontology gets evaluated after getting exploited in a particular way by a specific application for a particular task; therefore, it is pretty hard to generalize its performance.
Ontology can be a minor component of an application so its impact over the results may be indirect and small.
Various ontologies can be compared if they all can be embedded into the same application for the same task.

Moreover, Haase and Sure (100) assess the quality of specific ontology by finding the extent to which it enables the users to acquire relevant individuals in particular search. They introduce a cost intensive model to figure out the required user’s effort against desired relevant information. This cost is computed through the complexity of constructed hierarchy in form of breadth and depth.

Data-driven evaluation

Data-driven or so-called Corpus-based evaluation (96) utilizes existing domain-specific knowledge sources (usually textual corpora) to assess the extent of coverage by specific ontology in particular domain. The major advantage of this approach is enabling the comparison of one or more target ontologies with a specific corpus. Like golden standard-based approach, it also covers the similar evaluation criteria comprising of completeness, conciseness and accuracy of learned ontologies. The major challenge of data-driven approaches is to find a domain-specific corpus that is much easier than finding a fine domain-specific benchmark ontology. For instance, Jones and Alani (101) utilized Google as the search engine in order to find a corpus against a specific user query. After expanding the user query by exploiting WordNet, the top 100 pages of Google results are taken as the corpus for the sake of evaluation. Many researchers performed the corpus based evaluation. For example, Brewster et al. (102) explained the number of techniques and methodologies for assessing the structural fit among ontology and particular domain knowledge, which exists like text corpora. They acquire domain-specific terms from textual corpora by utilizing latent semantic analysis. The extent of overlap among domain-specific terms and terms revealing in a particular ontology (i.e. concepts names) are used to compute the fit among the ontology and corpus. Moreover, they proposed a probabilistic methodology to determine the best ontology among all candidate ontologies. Sordo et al. (39) used it to evaluate the music relations extracted from unstructured text. Likewise, Patel et al. (103) assessed the coverage of specific ontology by retrieving textual data such as concepts names and relations from it. The acquired textual data is exploited as a source of input to a fine text classification model, which is trained by utilizing various standard machine learning methodologies.

Human evaluation

Human evaluation of ontologies is generally based on defining and formulating various decision criteria for the selection of best ontology from a specified set of candidate ontologies. A numerical score is assigned after evaluating ontology against each criterion. Finally, a weighted sum is calculated through criterion scores. This kind of evaluation is also called ‘Criteria Based Evaluation’ (96). Criteria-based evaluation is extensively getting used in many contexts for the selection of best ontology (i.e. grant applications, tenders etc.). The major shortcoming of criteria-based evaluation is the requirement of high manual cost in terms of time and effort. However, this approach is deprecated and not used very often nowadays. Researchers did quite some work over this approach. For example, Burton-Jones et al. (104) proposed a list of 10 criteria comprising of richness (number of syntactic features present in formal language are utilized by specific ontology), lawfulness (syntactical errors frequency), interpretability (determining the existence of ontology terms in WordNet), clarity (number of terms senses present in WordNet), consistency (number of inconsistent concepts), accuracy (number of false statements in the target ontology), comprehensiveness (total concepts in the target ontology, compare to the average for the entire repository of ontologies), authority (number of ontologies utilizing the concepts from target ontology), history (number of accesses have been made to target ontology in comparison of other candidate ontologies) and relevance (total statements which involve significant syntactic features). Similarly, Fox et al. (105) present a set of criteria that is more inclined toward manual evaluation and assessment of ontologies. Lozano-Tello et al. (106) formulate a set comprising of 117 criteria, grouped in a framework of three levels. They assess taxonomies on the basis of multi-level properties comprising of cost, design qualities, language properties and tools through the assignment of some scores. Moreover, criteria-based evaluation can also be classified in two categories which are discussed below.

Structure-based evaluation
Structure-based methodologies explore and measure different structural properties in quest of evaluating specified taxonomy. Most proposed structure-based techniques fully automate the entire evaluation process. For example, one may compute the relational density of all existing nodes and an average of taxonomic depth. Like, Fernández et al. (107) examine the effect of various structural ontology methodologies in context of ontology quality. After extensive experimentation, they conclude that lavishly populated ontologies in terms of high depth and breadth values have more chances of being correct. Besides, Gangemi et al. (108) assess ontologies on the basis of presence of cycles in a directed graph.
Complex- and Expert-based evaluation
Complex- and expert-based evaluation measures are in high numbers, which try to embed various aspects and properties of ontology quality. For instance, Alani and Brewster et al. (109) add many ontology evaluation measures such as density, betweenness and class matching measures in ‘AKTiveRank’ system. Moreover, Guarino and Welty (110) assess ontologies through a system known as ‘OntoClean’. OntoClean is based on a set of notions comprising identity, essence and unity. They exploit the OntoClean notions to characterize and explore the suggested meaning of classes, relations and properties that actually prove significant to build up a specific ontology.

Ontology learning data sets

This section summarizes the characteristics of commonly used data sets and systems in ontology learning. For the development of ontologies using ontology learning techniques, data sets containing unstructured domain-specific documents are used. For the biological domain, most of the researchers use OHSUMED (http://davis.wpi.edu/xmdv/datasets/ohsumed) (111, 112, 113) and Genia Corpus (http://www.geniaproject.org/genia-corpus) (114, 115) for experimentation. Similarly, in traveling and tourism domain, data sets for ontology learning are Mecklenburg Vorpommern (116, 75) and Lonely Planet (http://www.lonelyplanet.com/destinations) (116, 75). Two large data sets of news domain namely British National Corpus (http://www.natcorp.ox.ac.uk/) (97) and Reuters-21578 (https://archive.ics.uci.edu/ml/datasets/reuters-21578+text+categorization+collection) (113, 97) are also extensively used for experimentation and evaluation of different ontology learning systems. Table 3 illustrates the characteristics of six data sets.

Table 3

Summary of Popular Datasets

Corpus	No. of documents	Domain	Tokens
Mecklenburg Vorpommern	1047	Tourism	332000
Lonely Planet	1801	Traveling	1 Million
British National Corpus	4124	News	100 Million
Reuters-21578	21578	News	218 Million
OHSUMED	348566	Biological	NA
Genia Corpus	2000	Biological	400000
Planet Stories	307	Stories	NA

Corpus	No. of documents	Domain	Tokens
Mecklenburg Vorpommern	1047	Tourism	332000
Lonely Planet	1801	Traveling	1 Million
British National Corpus	4124	News	100 Million
Reuters-21578	21578	News	218 Million
OHSUMED	348566	Biological	NA
Genia Corpus	2000	Biological	400000
Planet Stories	307	Stories	NA

Table 3

Summary of Popular Datasets

Corpus	No. of documents	Domain	Tokens
Mecklenburg Vorpommern	1047	Tourism	332000
Lonely Planet	1801	Traveling	1 Million
British National Corpus	4124	News	100 Million
Reuters-21578	21578	News	218 Million
OHSUMED	348566	Biological	NA
Genia Corpus	2000	Biological	400000
Planet Stories	307	Stories	NA

Corpus	No. of documents	Domain	Tokens
Mecklenburg Vorpommern	1047	Tourism	332000
Lonely Planet	1801	Traveling	1 Million
British National Corpus	4124	News	100 Million
Reuters-21578	21578	News	218 Million
OHSUMED	348566	Biological	NA
Genia Corpus	2000	Biological	400000
Planet Stories	307	Stories	NA

Industrial applications of ontology learning

A large amount of unstructured and semistructured data is being generated every second in the world. If we talk about statistics of data generation, almost 2.5 quintillion bytes of data were generated every day in 2017, which is a humongous amount ( https://www.ibm.com/blogs/insights-on-business/consumer-products/2-5-quintillion-bytes-of-data-created-every-day-how-does-cpg-retail-manage-it/). These data are distributed over the internet at various websites in such a way that it is totally disconnected. Storing such gigantic amount of data requires a lot of resources. Moreover, it is extremely difficult to process such data is order to find useful information. This marks the desperate need of a knowledge representation model, which shall store such data in a more structured way to enable fast processing and quick retrieval at large scale. The model that enables structured representation of data is known as ontology.

Ontologies are being extensively used in information retrieval, question answering and decision support systems. This section illustrates applications of ontology in diverse industries such as oil and gas industry, military, e-government, e-health and e-culture etc.

Oil and gas industry

Oil and gas industry is one of the most data intensive industry that is generating a huge amount of important data every day. Data are being generated from various sources in the form of oil wells data, seismic data, drilling and transportation data, customer data and marketing data. Since it is one of the industry that controls the balance of power in the world, these data along with its semantic are of significant importance as it can be used to derive very useful information. Soma et al. (127) presented a reservoir management system that uses the semantic web to access and enhance the view of information present in its core knowledge base. Fluor Corporation’s Accelerating Deployment of ISO 15926 (ADI) (150, 151) project converts ISO 159263 Part 4 (a resource of oil and gas industry that has descriptions of plant objects) into RDF/OWL form to make it process-able by computer systems. Norwegian Daily Production Report project implemented ontology based on ISO 15926 standard to make data comparison and retrieval easy. Moreover, workflow and quality of oil and gas industry can be further improved by utilizing the semantic web concepts by integrating the semantic web with Internet of Things.

Military technology

Diverse military technologies such as drones and weaponized mobile robots are producing exponentially large battlefield information. Technologists are using the semantic web to manage massive data load and assist decision analysis during the battle by utilizing the significant information produced by all auto-military units. In addition, ontologies are being constructed to conjure up battlefield information for quick retrieval. Halvorsen and Hansen (152) provided an integrated approach to access military information, which uses RDF representation and serialization mechanism between various systems and uses SPARQL as communication protocol. This approach can be used for threat detection by reasoning over the information provided in RDF triplets~(128).

In quest of standardizing available information, decision making and exchanging information effectively, technologists introduced diverse ontologies like MilInfo (129) and Air Tasking Order (ATO) (130). The ATO helps to assign the aircraft missions. Besides this, Tactic Technique and Procedure Ontology (131) as well as Battle Management Ontology (132) are some more ontologies to assist military decision making and shared information access. Another possible ontology could be the soldier ontology (http://rdf.muninn-project.org/ontologies/military.html), which can be generated by making use of the data of both on duty and retired soldiers. This type of ontology can help in selection of soldiers for specific missions and keeping tracks of retired senior soldiers.

E-government

Incorporation of ontology and the semantic web in e-government portals can be very fruitful. Instead of relying only on text, the underlying ontology can be used to extract the information that is semantically more meaningful to the query. Such portals are more efficient than simple traditional search portals, which do not consider semantics. Various governmental departments will be able to keep their knowledge bases in sync by using the underlying ontologies.

Rui et al. (133) presented the concept of semantic information portal that utilized semantic search algorithm. They not only proposed but also implemented the algorithm to retrieve semantically correct results against queries. On the other hand, Haav (134) described a process with which ontologies can be created for e-governmental data. By making use of these ontologies and semantics, government can manage their resources effectively and improve the planning and development policies.

E-business and E-commerce

E-business and e-commerce have also started utilizing the powers of the semantic web to make important business decisions and to develop smart systems for end users by handling massive available data efficiently using ontologies. GoodRelations is one such ontology introduced by Hepp (135). The ontology is essential for any semantic based web platform as it models various e-commerce concepts like products, prices, discount offers, sales offers etc. LIB2CO created by Akanbi (136) is another integrated semantic web platform that offers two major agents. One is search agent that retrieves semantically correct results to consumer queries by analyzing the metadata attached to products. The other is ontology agent whose task is to organize all the products into an ontology so that the search agent can find it effectively.

Ontologies are also helpful in commerce matchmaking where the best compatible services and goods are selected for the user. Paoloucci et al. (137) developed such a system which comprises of various ontologies and a matchmaker. Besides this, a security ontology developed by Ekelhart et al. (138) played its part in the security infrastructure of ontology based ecommerce and e-business.

E-health and life sciences

E-health and life sciences industry are also in quest of feeding patient data electronically for better processing and quick retrieval. In order to make this data useful for artificial intelligence applications, semantics behind the data need to be involved to enable automatic decision making.

European Patient Summary (153) is one such project whose backbone lies in the semantic web technologies. Besides this, ontologies and semantics have also been used by Podgorelec and Pavlic (139) to store and integrate the data about Mitral Valve Proplapse syndrome. Kim and Choi (140) presented an electrocardiography ontology for heart diseases and used it to create a knowledge base. Ganguly et al. (141) also worked on eHealth-based ontologies by addressing the issue of mismatch between conceptual hierarchies in ontologies. Some other applications of ontology learning for eHealth are present in the form of ontologies like Human Phenotype Ontology (142), Translational Medicine Ontology (143) and SNOMED CT (Systemized Nomenclature of Medicine Clinical Terms) (144).

Multimedia and E-culture

Annually, a huge amount of multimedia content is released on the internet, which includes >2500 movies and 1 million songs. The metadata attached to these multimedia contents along with its semantics can prove to be very helpful for multimedia companies as they can use it to build precise and accurate recommendation systems for their customers.

Retrieving relevant images, video contents and songs is one of the tasks that can be done using ontologies and semantics. Fan and Li (145) used an ontology-based reasoning system to retrieve the images relevant to the queries. Besides this, an animal ontology has been used in animal domain by Wang et al. (146) to retrieve and annotate animal images. Liu et al. (147) used reverse engineering process and generated an image ontology from images data. Ontologies have found their application in video annotation and retrieval process by utilizing the semantics of events happening in the video. Ballan et al. (148) presented one such framework for annotation and retrieval of video content.

Investigative and digital journalism

The semantic web and usage of numerous ontologies have taken journalism to next level by enabling the exploration of hidden and non-achievable information for all journalist through deeper search. For instance, Panama Papers is a gigantic list of documents that contains information about organizations and individuals who dodge sanction and taxes. Unfortunately, its information was non-accessible to journalists. Ontotext (https://ontotext.com/) company constructed an ontology from the list of these documents to give them more structure and meaning. It also enabled querying mechanism using SPARQL. Similarly, Trump World Data is another result of investigative journalism which has been transformed into structured text for easy information access.

Future directions

Ontology learning is a multidisciplinary task that extracts important terms, concepts, attributes and relations from unstructured text by borrowing techniques from different domains like text classification, natural language processing machine learning etc. These domains are research extensive and still developing. Natural language processing has various bottlenecks such as part of speech tagging, relation extraction from unstructured text, co-reference resolution and named entity recognition. From results discussed in the section entitled Linguistics for pre-processing, it can be concluded that techniques like PoS tagging and parsing can lead toward the development of better ontologies. With the advancement in NLP techniques, improved PoS taggers and parsers are being introduced that needs to be merged into ontology learning systems for better performance. In text classification, researchers are developing new algorithms to select highly discriminative features among the classes. There are many term selection algorithms available in these domains that [Bi-Normal Separation, Normalized Difference Measure, Odds Ratio, Poisson Ratio Balanced Accuracy Measure (ACC2) and Distinguishing Feature Selection (154)] needs to be introduced in ontology learning for the extraction of terms and concepts.

As far as machine learning is concerned, ontology learning borrows various techniques from this domain such as clustering and ARM. However, improvements can be made by incorporating the domain of deep learning into these algorithms. Besides this, the exponential growth of textual data on the web is heavily influencing various methods used at different levels of ontology learning. It can be said that the future of ontology learning will be led by the immense amount of unstructured web data. We propose following future directions to further improve ontology learning process:

Use of social media for data validation
Language independent ontology learning
Scalability of existing ontology learning techniques to cater larger data sets
Use of crowdsourcing and human-based computation games to perform ontology post processing
Development of more formal or heavyweight ontologies

This section summarizes five prominent challenges of ontology learning and discusses above mentioned future directions in context of these challenges.

Challenge 1: The immense amount of web data exists in different formats and languages. This leads to the production of conflicting and inconsistent ontologies.

Proposed solution:

To resolve this issue, we propose look for approaches to integrate and homogenize such data. This field has not yet gained enough attention by ontology learning community. We also propose use of cross language ontologies in quest of resolving such issues. There exists a need to develop advanced algorithms for ontology learning which are independent of language barriers. Since ontologies are actually shared conceptualization, they should be free of lexical information. For example, orange should not be portrayed lexically as ‘orange’ but rather as a form to which oranges of all languages can be mapped to.

Challenge 2: Ontology learning is still a developing field where each task of ontology learning layer cake is vast research that needs improvement. Each stage is dependent on results of the previous stage. If one stage produces wrong information, it will affect the later stages and it would eventually produce low quality ontologies. For example, if a faulty relation <VladmirPutin> <is−a> <president of Italy> occurs frequently in data, ontology learning methods will extract it and add it to final ontology. This will contaminate underlying knowledge base.

Proposed solution:

To ensure data validity we propose use of social web and folksonomy (collaborative tagging). We can assess the validity of learned ontology by asking users of social media to tag extracted concepts and relations either as correct or incorrect. By comparing the total number of users tagging them correct and incorrect, we can develop some level of trust for our learned ontology.

Challenge 3: Scalability of ontology learning techniques to accommodate larger data sets is another major challenge. Most of the techniques and tools used in state-of-the-art ontology learning methodologies are designed for smaller data sets. Such techniques and tools, when applied on bigger data sets, tend to produce inefficient results.

Proposed solution:

We suggest an increase in research to scale the present techniques up to certain level to accommodate larger data sets without compromising on the efficiency and quality. This can be done by introducing some community challenges like BioASQ, BioCreative, TREC etc. Various incentives in these challenges will be attractive for researchers and improvements will be made to tackle this challenge.

Challenge 4: The quality of learned ontologies is affected by the human intervention. We can say that the quality of learned ontology is directly proportional to human intervention. This is why semi-automatic ontology acquisition process tends to produce good ontologies. For automatic ontology learning process, a reasonable amount of post processing is required to boost the quality of ontology, which is another massive drawback of fully automated ontology acquisition. It puts a lot of burden on knowledge engineers and domain experts.

Proposed solution:

This post processing stage somehow must be integrated with the original ontology learning framework. To reduce this overhead, we propose to utilize the extensive amount of research in the field of crowdsourcing and human-based computation game (games with purpose). These can help lower the cost of ontology revision by involving non-expert humans and interacting with them to achieve post processing goals.

Challenge 5: Lastly, we predict a need to shift from lightweight ontologies to more formal, heavyweight ontologies in the future.

Proposed solution:

To tackle this problem, there is a strong need to strengthen axiom learning techniques so that in future formal ontologies take the center stage.

Above aforementioned challenges and future direction are summarized in Table 4.

Table 4

Summary of Ontology Learning: Challenges and Future Directions

	Challenge	Proposed Solution
1	Diversity of formatted data, multi-lingual data	Novel approaches to integrate and harmonize data Cross-language ontologies advanced algorithms for ontology learning
2	Lack of automatic ontology validation, faulty ontologies	Use of social web, collaborative tagging and folksonomy Use of search engines for answer validation
3	Scalability of ontology learning techniques	Increase in research to accommodate larger datasets Arrangement of community challenges by governing bodies to increase the research scale of ontology learning techniques
4	Requirement of human intervention for better quality of learned ontologies	Need of automatic post processing techniques Integrate post processing framework with ontology learning framework to boost the quality of ontology Use of research in the fields of crowdsourcing and human-based computation games
5	Lack of heavy weight ontologies	Strengthen axiom learning algorithms

	Challenge	Proposed Solution
1	Diversity of formatted data, multi-lingual data	Novel approaches to integrate and harmonize data Cross-language ontologies advanced algorithms for ontology learning
2	Lack of automatic ontology validation, faulty ontologies	Use of social web, collaborative tagging and folksonomy Use of search engines for answer validation
3	Scalability of ontology learning techniques	Increase in research to accommodate larger datasets Arrangement of community challenges by governing bodies to increase the research scale of ontology learning techniques
4	Requirement of human intervention for better quality of learned ontologies	Need of automatic post processing techniques Integrate post processing framework with ontology learning framework to boost the quality of ontology Use of research in the fields of crowdsourcing and human-based computation games
5	Lack of heavy weight ontologies	Strengthen axiom learning algorithms

Table 4

Summary of Ontology Learning: Challenges and Future Directions

	Challenge	Proposed Solution
1	Diversity of formatted data, multi-lingual data	Novel approaches to integrate and harmonize data Cross-language ontologies advanced algorithms for ontology learning
2	Lack of automatic ontology validation, faulty ontologies	Use of social web, collaborative tagging and folksonomy Use of search engines for answer validation
3	Scalability of ontology learning techniques	Increase in research to accommodate larger datasets Arrangement of community challenges by governing bodies to increase the research scale of ontology learning techniques
4	Requirement of human intervention for better quality of learned ontologies	Need of automatic post processing techniques Integrate post processing framework with ontology learning framework to boost the quality of ontology Use of research in the fields of crowdsourcing and human-based computation games
5	Lack of heavy weight ontologies	Strengthen axiom learning algorithms

	Challenge	Proposed Solution
1	Diversity of formatted data, multi-lingual data	Novel approaches to integrate and harmonize data Cross-language ontologies advanced algorithms for ontology learning
2	Lack of automatic ontology validation, faulty ontologies	Use of social web, collaborative tagging and folksonomy Use of search engines for answer validation
3	Scalability of ontology learning techniques	Increase in research to accommodate larger datasets Arrangement of community challenges by governing bodies to increase the research scale of ontology learning techniques
4	Requirement of human intervention for better quality of learned ontologies	Need of automatic post processing techniques Integrate post processing framework with ontology learning framework to boost the quality of ontology Use of research in the fields of crowdsourcing and human-based computation games
5	Lack of heavy weight ontologies	Strengthen axiom learning algorithms

Conclusion

This paper summarizes ontology learning techniques along with evaluation measures and highlights applications of ontology learning in various domains. We observed that a hybrid approach comprising of both linguistic and statistical techniques produces better ontologies. However, it is difficult to find the best technique among all as the performance of ontology learning techniques is highly dependent on efficient preprocessing of data in target domain. After critically analyzing the literature of ontology learning, following trends are observed: for term and concept extraction, many researchers prefer to use statistical techniques; however, for relation extraction, there is an inclination of use toward agglomerative clustering and ARM. We also overviewed various evaluation techniques for ontology learning and have found that the best form of evaluation is human-based evaluation. In addition, we also mark most widely used ontology learning tools along with their respective methodology and target domain. Applications of ontology learning in industries such as oil and gas, military and e-health etc. are also discussed. Lastly, we provide comprehensive information about ontology learning challenges. We also propose their solutions to further improve the process of ontology learning by showing directions for answer validation, language-independent ontology generation and crowdsourcing usage for automatic ontology post processing.

Conflict of interest. None declared.

References

[1]

Maedche,

and Staab,

(2001) Ontology learning for the semantic web.

IEEE Intell. Syst.

, 16, 72–79.

[2]

Gruber,

T.R.

(1995) Toward principles for the design of ontologies used for knowledge sharing?

Int. J. Hum. Comput. Stud.

, 43, 907–928.

[3]

Cullen,

and Bryman,

(1988) The knowledge acquisition bottleneck: time for reassessment?

Expert Systems

, 5, 216–225.

[4]

Chen,

, Dosyn,

, Lytvyn,

et al. (2016) Smart data integration by goal driven ontology learning. In: INNS Conference on Big Data, Springer, Thessaloniki, Greece, 283–292.

[5]

Ding,

and Foo,

(2002) Ontology research and development. part 2-a review of ontology mapping and evolving.

J. Inf. Sci.

, 28, 375–388.

[6]

Gómez-Pérez,

and Manzano-Macho,

(2003) A survey of ontology learning methods and techniques.

Onto Web Deliverable

D 1 (5)

[7]

Faure,

and Nédellec,

(1998)

Asium: learning subcategorization frames and restrictions of selection, Chemnitz, Allemagne

[8]

Yamaguchi,

(2001) Acquiring conceptual relationships from domain-specific texts. In: Workshop on Ontology Learning, Levanger, Norway, 38, 69–113.

[9]

Shamsfard,

(2003) Designing the ontology learning model, prototyping in a persian text understanding system.

Ph.D. Thesis

. Amir Kabir University, Iran, Tehran.

[10]

de Chalenda,

and Brigitte,

(2000) SVETLAN A System to Classify Nouns in Context.

Workshop on Ontology Learning

[11]

Hahn,

and Romacker,

(2001) The syndikate text knowledge base generator. In:

Proceedings of the First International Conference on Human Language Technology Research

. Association for Computational Linguistics, San Diego, 1–6.

[12]

Maedche,

and Staab,

(2000) Discovering conceptual relations from text. In: ECAI. Berlin, 321, 27.

[13]

Craven,

, McCallum,

, PiPasquo,

et al. (1998) Learning to extract symbolic knowledge from the world wide web.

Technical Report. School of Computer Science, Carnegie-Mellon University, Pittsburgh, PA

[14]

Shamsfard,

and Barforoush,

A.A.

(2003) The state of the art in ontology learning: a framework for comparison.

Knowl. Eng. Rev.

, 18, 293–316.

[15]

Buitelaar,

, Cimiano,

and Magnini,

(2005) Ontology learning from text: an overview. In:

Ontology Learning from Text: Methods, Evaluation and Applications

, Amsterdam, IOS Press, 123, 3–12.

[16]

Zhou,

(2007) Ontology learning: state of the art and open issues.

Inf. Technol. Manag.

, 8, 241–252.

). https://pdfs.semanticscholar.org/bd0b/ab6fc8cd43c0ce170ad2f4cb34181b31277d.pdf.

[17]

Hazman,

, El-Beltagy,

S.R.

and Rafea,

A survey of ontology learning approaches.

Database

, 7, 36–43.

[18]

Brill,

(1992) A simple rule-based part of speech tagger. In:

Proceedings of the Third Conference on Applied Natural Language Processing

. Association for Computational Linguistics, Trento, Italy, 152–155.

[19]

Schmid,

(1994) Probabilistic part-of-speech tagging using decision trees. In: Proceedings of International Conference on New Methods in Language Processing, 1–9 (

access date: 11 September 2012

[20]

Lin,

(1994) Principar: an efficient, broad-coverage, principle-based parser. In:

Proceedings of the Fifteenth Conference on Computational Linguistics

. Association for Computational Linguistics, Kyoto, Japan, 1, 482–488.

[21]

Lin,

(1998) Dependency-based evaluation of minipar at lrec, In: Proceedings of the Workshop on the Evaluation of Parsing Systems., Granada, Spain. http://www.cs.ualberta.ca/lindek/minipar.htm.

[22]

Temperley,

, Sleator,

and Lafferty,

(1993)

Parsing english with a link grammar. In:

Third International Workshop on Parsing Technologies

Tilburg, Netherlands

[23]

Klein,

and Manning,

C.D.

(2003)

Accurate unlexicalized parsing. In:

Proceedings of the Forty-first annual meeting of the Association for Computational Linguistics

Sapporo, Japan

[24]

Petit,

, Boisson,

J.-C.

and Rousseaux,

(2017)

Discovering cultural conceptual structures from texts for ontology generation. In:

IEEE 2017 Fourth International Conference on Control, Decision and Information Technologies, St. Paul's Bay, Malta, (CoDIT). 0225–0229.

[25]

Cunningham,

, Maynard,

, Bontcheva,

et al. (2002) Gate: an architecture for development of robust hlt applications. In:

Proceedings of the Fortieth Annual Meeting on Association for Computational Linguistics

. Association for Computational Linguistics, Philadelphia, Pennsylvania, 168–175.

[26]

Drymonas,

E.G.

Ontology learning from text based on multi-word term concepts: the ontogain method.

M.Sc. Thesis

. Technical University of Crete, Greece.

[27]

Oliveira,

, Pereira,

F.C.

and Cardoso,

(2001) Automatic reading and learning from text. In: Proceedings of the International Symposium on Artificial Intelligence (ISAI), India.

[28]

Jiang,

and Tan,

A.-H.

(2010) Crctol: a semantic-based domain ontology learning system.

J. Assoc. Inform. Sci. Technol.

, 61, 150–168.

[29]

Hippisley,

, Cheng,

and Ahmad,

(2005) The head-modifier principle and multilingual term extraction.

Nat. Lang. Eng.

, 11, 129–157.

[30]

Agustini,

, Gamallo,

and Lopes,

G.P.

(2001) Selection restrictions acquisition for parsing improvement. in: International Conference on Applications of Prolog, Springer, 129–143.

[31]

Gamallo,

, Agustini,

and Lopes,

G.P.

Learning subcategorisation information to model a grammar with “co-restrictions”. Modélisation probabiliste du langage naturel.

TAL. Traitement automatique des langues

, 44, 93–117.

[32]

Faure,

and Nedellec,

(2016)

Knowledge acquisition of predicate argument structures from technical texts using machine learning: the system asium. In:

International Conference on Knowledge Engineering and Knowledge Management

, Springer, Siguenza, Spain. 329–334.

[33]

Belal,

M.A.E.-F.

, Abdel-Galil,

and Saber,

Y.M.

(2016) Ontology extraction from text: Related works between arabic and english languages.

Int. J.

, 4.

[34]

Hwang,

C.H.

(1999)

Incompletely and imprecisely speaking: using dynamic ontologies for representing and retrieving information. In: KRDB

, CEUR-WS, Linköping, Sweden, 21, 14–20.

[35]

Sanchez,

and Moreno,

(2004) Creating ontologies from web documents. In:

Recent advances in artificial intelligence research and development

, IOS Press, Amsterdan, 113 11–18.

[36]

Fraga,

A.L.

and Vegetti,

(2017) Semi-automated ontology generation process from industrial product data standards. In: III Simposio Argentino de Ontolog’ıas y sus Aplicaciones (SAOA)-JAIIO, Córdoba, Argentina, 46 (Co’rdoba, 2017).

[37]

Kang,

, Patil,

, Rangarajan,

et al. (2015) Extraction of manufacturing rules from unstructured text using a semantic framework. In:

ASME 2015 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference

. American Society of Mechanical Engineers, Boston, V01BT02A033–V01BT02A033.

[38]

Ciaramita,

, Gangemi,

, Ratsch,

et al. (2005)

Unsupervised learning of semantic relations between concepts of a molecular biology ontology. In:

IJCAI

, Morgan Kaufmann Publishers, Edinburgh, Scotland, UK, 659–664.

[39]

Sordo,

, Oramas,

and Espinosa-Anke,

(2015) Extracting relations from unstructured text sources for music recommendation. In:

International Conference on Applications of Natural Language to Information Systems

, Springer, Passau, Germany, 369–382.

[40]

Hearst,

M.A.

(1998) Automated discovery of wordnet relations, WordNet: an electronic lexical.

Database

, 131–153.

[41]

Kaushik,

and Chatterjee,

Automatic relationship extraction from agricultural text for ontology construction.

Inform. Process. Agri

, 5, 60–73.

[42]

Ismail,

, Abu Bakar,

and Abd Rahman,

(2015) Extracting knowledge from English translated Quran using NLP pattern.

Jurnal Teknologi

, 77, 67–73.

[43]

Ismail,

, Bakar,

Z.A.

and Rahman,

N. A.

Ontology learning framework for Quran.

Advanced Science Letters,

23, 4175–4178.

[44]

Panchenko,

, Faralli,

, Ruppert,

et al. (2016) Taxi at semeval-2016 task 13: a taxonomy induction method based on lexico-syntactic patterns, substrings and focused crawling. In:

Proceedings of the Tenth International Workshop on Semantic Evaluation (SemEval-2016)

. Association for Computational Linguistics (ACL), San Diego, California, 1320–1327.

[45]

Atapattu,

, Falkner,

and Falkner,

(2017) A comprehensive text analysis of lecture slides to generate concept maps.

Comput. Educ.

, 115, 96–113.

[46]

Snow,

, Jurafsky,

and Ng,

A.Y.

(2005) Learning syntactic patterns for automatic hypernym discovery. In:

Advances in Neural Information Processing Systems

, 1297–1304.

[47]

Sen,

, Tao,

and Deokar,

A.V.

(2015) On the role of ontologies in information extraction. In:

Reshaping Society through Analytics, Collaboration, and Decision Support

, Springer, Switzerland, 115–133.

[48]

Turcato,

, Popowich,

, Toole,

et al. (2000) Adapting a synonym database to specific domains. In:

Proceedings of the ACL-2000 Workshop on Recent Advances in Natural Language Processing and Information Retrieval, held in conjunction with the Thirtieth Annual Meeting

. Association for Computational Linguistics, Hong Kong, 11, 1–11.

[49]

Navigli,

, Velardi,

and Gangemi,

(2003) Ontology learning and its application to automated terminology translation.

IEEE Intell. Syst.

, 18, 22–31.

[50]

Frantzi,

, Ananiadou,

and Mima,

(2000) Automatic recognition of multiword terms: the c-value/nc-value method.

Int. J. Dig. Libr.

, 3, 115–130.

[51]

Hersh,

, Buckley,

, Leone,

et al. (1994) Ohsumed: an interactive retrieval evaluation and new large test collection for research. In:

SI-GIR94

, Springer, Dublin, Ireland, 192–201.

[52]

Milios,

, Zhang,

, He,

et al. (2003) Automatic term extraction and document similarity in special text corpora. In: Proceedings of the Sixth Conference of the Pacific. Association for Computational Linguistics, Yangon, Myanmar, 275–284.

[53]

Yang,

, Zhou,

and Nyberg,

(2016) Learning to answer biomedical questions: Oaqa at bioasq 4b, In:

Proceedings of the Fourth BioASQ Workshop

, Yangon, Myanmar, 23–37.

[54]

Chandu,

, Naik,

, Chandrasekar,

et al. (2017) Tackling biomedical text summarization: Oaqa at bioasq 5b.

BioNLP

, 2017, 58–66.

[55]

Navigli,

and Velardi,

(2002) Semantic interpretation of terminological strings. In:

Proceedings of the Sixth International Conference on Terminology and Knowledge Engineering

, Nancy, France, 95–100.

[56]

Guo,

, Qiu,

and Zhang,

(2015) Web-based chinese term extraction in the field of study. In:

IEEE Eleventh International Conference on Semantics, Knowledge and Grids (SKG)

, Beijing, China, 133–139.

[57]

Xiao,

, Ruan,

, Yang,

et al. (2016) Domain ontology learning enhanced by optimized relation instance in dbpedia. In:

LREC

[58]

Resnik,

(1999) Semantic similarity in a taxonomy: An information- based measure and its application to problems of ambiguity in natural language.

J. Artif. Intell. Res.

, 11, 95–130.

[59]

Senellart,

P.P.

and Blondel,

V.D.

(2003) Automatic discovery of similar words. In: Berry

(ed).

Survey of Text Mining: Clustering, Classification, and Retrieval

, Springer, UK.

[60]

Maedche,

and Staab,

(2002) Measuring similarity between ontologies. In:

International Conference on Knowledge Engineering and Knowledge Management

, Springer, Siguenza, Spain, 251–263.

[61]

Suresu,

and Elamparithi,

(2016) Probabilistic relational concept extraction in ontology learning.

Int. J. Inform. Technol.

, 2.

[62]

Frikh,

, Djaanfar,

A.S.

and Ouhbi,

(2011) A hybrid method for domain ontology construction from the web. In:

KEOD

, Springer, Paris, France, 285–292.

[63]

Landauer,

T.K.

, Foltz,

P.W.

and Laham,

(1998) An introduction to latent semantic analysis.

Discourse Process.

, 25, 259–284.

[64]

Rani,

, Dhar,

A.K.

and Vyas,

(2017) Semi-automatic terminology ontology learning based on topic modeling.

Eng. Appl. Artificial Intell.

, 63, 108–125.

[65]

Berkhin,

(2006) A survey of clustering data mining techniques. In:

Grouping Multidimensional Data

, Springer, United States, 25–71.

[66]

Karoui,

, Aufaure,

M.-A.

and Bennacer,

(2007) Contextual concept discovery algorithm. In:

FLAIRS Conference

, AAAI Press, Key West, Florida, USA, 460–465.

[67]

Njike-Fotzo,

and Gallinari,

Learning generalization/specialization relations between concepts–application for automatically building thematic document hierarchies In:

Coupling approaches, coupling media and coupling languages for information retrieval, LE CENTRE DE HAUTES ETUDES INTERNATIONALES D'INFORMATIQUE DOCUMENTAIRE, ACM,

143–155.

[68]

Zepeda-Mendoza,

M. L.

and Resendis-Antonio,

(2013) Hierarchical agglomerative clustering. In:

Encyclopedia of Systems Biology

, Springer, United States, 886–887.

[69]

Dhillon,

I.S.

, Mallela,

and Kumar,

(2003) A divisive information-theoretic feature clustering algorithm for text classification.

J. Mach. Learn. Res.

, 3, 1265–1287.

[70]

Ragunath,

and Sivaranjani,

(2015) Ontology based text document summarization system using concept terms.

ARPN J. Eng. Appl. Sci.

, 10, 2638–2642.

[71]

Faure,

and Nédellec,

(1998) A corpus-based conceptual clustering method for verb frames and ontology acquisition. In:

LREC Workshop on Adapting Lexical and Corpus Resources to Sublanguages and Applications

, LREC, Granada, Spain, Vol. 707, 30.

[72]

Drymonas,

, Zervanou,

and Petrakis,

E.G.

(2010) Unsupervised ontology acquisition from plain texts: the ontogain system. In:

NLDB

. Springer, Cardiff, United Kingdom, 277–287.

[73]

Caraballo,

S.A.

(1999) Automatic construction of a hypernym labeled noun hierarchy from text. In: Proceedings of the Thirty-seventh annual meeting of the Association for Computational Linguistics on Computational Linguistics. Association for Computational Linguistics, ACM, Maryland, USA, 120–126.

[74]

Savaresi,

S.M.

, Boley,

D.L.

, Bittanti,

et al. (2002) Cluster selection in divisive clustering algorithms. In:

Proceedings of the 2002 SIAM International Conference on Data Mining

, SIAM, Arlington, VA, USA, 299–314.

[75]

Cimiano,

and Staab,

(2005) Learning concept hierarchies from text with a guided agglomerative clustering algorithm, In:

Proceedings of the ICML 2005 Workshop on Learning and Extending Lexical Ontologies with Machine Learning Methods

Bonn, Germany

[76]

Liu,

, Hsu,

, Mun,

L.-F.

et al. (1999) Finding interesting patterns usinguser expectations.

IEEE Trans. Knowl. Data Eng.

, 11, 817–832.

[77]

Idoudi,

, Ettabaa,

K.S.

, Solaiman,

et al. (2016) Association rules based ontology enrichment.

Int. J Web Appl.

, 8, 16–25.

[78]

Paiva,

, Costa,

, Figueiras,

et al. (2014) Discovering semantic relations from unstructured data for ontology enrichment: association rules based approach. In: Information Systems and Technologies (CISTI), 2014 9th Iberian Conference on. IEEE, Barcelona, Spain, 1–6.

[79]

Ghezaiel,

L.B.

, Latiri,

C.C.

and Ahmed,

M.B.

(2012) Ontology enrichment based on generic basis of association rules for conceptual document indexing. In:

KEOD

, Springer, Barcelona, Spain, 53–65.

[80]

Paiva,

L.M.S.S.

(2015) Semantic relations extraction from unstructured information for domain ontologies enrichment.

Ph.D. Thesis in RUN - Universidade NOVA de Lisboa

[81]

Fatemi,

, Poulin,

, Raileany,

L.E.

et al. Using association rule mining to enrich semantic concepts for video retrieval In:

KDIR 2009-International Conference on Knowledge Discovery and Information Retieval,

INSTICC Press, Dublin City University, 6–8.

[82]

d’Amato,

and Learning,

N.-S.

On extracting rules for: enriching ontological knowledge bases, complementing heterogeneous sources of information, empowering the reasoning process. In:

Neural-Symbolic Learning and Reasoning

, 56.

[83]

Lima,

, Espinasse,

, Oliveira,

et al. (2013)

An inductive logic programming-based approach for ontology population from the web. In:

International Conference on Database and Expert Systems Applications

, Springer, Prague, Czech Republic, 319–326.

[84]

Fortuna,

, Lavrač,

and Velardi,

(2008) Advancing topic ontology learning through term extraction. In:

Pacific Rim International Conference on Artificial Intelligence

, Springer, Hanoi, Vietnam, 626–635.

[85]

Seneviratne,

and Ranasinghe,

(2011) Inductive logic programming in an agent system for ontological relation extraction.

Int. J. Mach. Learn. Comput.

, 1, 344.

[86]

Lisi,

F.A.

and Esposito,

(2008) Foundations of onto-relational learning. In:

International Conference on Inductive Logic Programming

, Springer, Prague, Czech Republic, 158–175.

[87]

Lisi,

F.A.

and Straccia,

(2013) A logic-based computational method for the automated induction of fuzzy ontology axioms.

Fundamenta Informaticae

, 124, 503–519.

[88]

Ponzetto,

S.P.

and Strube,

(2007) Deriving a large scale taxonomy from wikipedia.

AAAI

, 7, 1440–1445.

[89]

Leacock,

and Chodorow,

(1998) Combining local context and wordnet similarity for word sense identification, WordNet: an electronic lexical.

Database

, 49, 265–283.

[90]

Zavitsanos,

, Paliouras,

and Vouros,

G.A.

(2011) Gold standard evaluation of ontology learning methods through ontology transformation and alignment.

IEEE Trans. Knowl. Data Eng.

, 23, 1635–1648.

[91]

Trokanas,

and Cecelja,

(2016) Ontology evaluation for reuse in the domain of process systems engineering.

Comput. Chem. Eng.

, 85, 177–187.

[92]

Sfar,

, Chaibi,

A.H.

, Bouzeghoub,

et al. (2016) Gold standard based evaluation of ontology learning techniques. In:

Proceedings of the Annual ACM Symposium on Applied Computing

, ACM, Salamanca, Spain, 339–346.

[93]

Kashyap,

, Ramakrishnan,

, Thomas,

et al. (2005) Taxaminer: an experimentation framework for automated taxonomy bootstrapping.

Int. J. Web Grid Serv.

, 1, 240–266.

[94]

Treeratpituk,

, Khabsa,

and Giles,

C.L.

(2014)

Graph-based approach to automatic taxonomy generation (GrabTax).

arXiv preprint arXiv:1307.1718

[95]

Sánchez,

, Batet,

, Martínez,

et al. (2015) Semantic variance: an intuitive measure for ontology accuracy evaluation.

Eng. Appl. Artificial Intell.

, 39, 89–99.

[96]

Dellschaft,

and Staab,

(2008) Strategies for the evaluation of ontology learning.

Ontol. Learn. Popul.

, 167, 253–272.

[97]

IJntema,

, Sangers,

, Hogenboom,

et al. (2012) A lexico-semantic pattern language for learning ontology instances from text.

Web Semant.

, 15, 37–50.

[98]

Lozano-Tello,

, Gómez-Pérez,

and Sosa,

(2003) Selection of ontologies for the semantic web. In:

International Conference on Web Engineering

, Springer, Munich, German, 413–416.

[99]

Porzel,

and Malaka,

(2004) A task-based approach for ontology evaluation. In:

ECAI Workshop on Ontology Learning and Population,

IOS Press, Valencia, Spain, Citeseer, 1–6.

[100]

Haase,

and Sure,

D3. 2.1 usage tracking for ontology evolution. In:

EU-IST Integrated Project (IP)

IST-2005-506826 SEKT

[101]

Jones,

and Alani,

(2006) Content-based ontology ranking.

In:

Ninth International Prot Conference,

Stanford, CA, 93.

[102]

Brewster,

, Alani,

, Dasmahapatra,

et al.

Data driven ontology evaluation. In:

LREC 2004, LISBON - PORTUGAL, ELRA - European Language Resources Association,

641–644.

[103]

Patel,

, Supekar,

, Lee,

et al. (2003) Ontokhoj: a semantic web portal for ontology searching, ranking and classification. In:

Proceedings of the Fifth ACM international workshop on Web information and data management

, ACM, Seattle, WA, USA, 58–61.

[104]

Burton-Jones,

, Storey,

V.C.

, Sugumaran,

and Ahluwalia,

(2005) A semiotic metrics suite for assessing the quality of ontologies.

Data Knowl. Eng.

, 55, 84–102.

[105]

Fox,

M.S.

, Barbuceanu,

and Gruninger,

(1995) An organisation ontology for enterprise modelling: preliminary concepts for linking structure and behaviour. In: Enabling Technologies: Infrastructure for Collaborative Enterprises, 1995., Proceedings of the Fourth Workshop on, IEEE, West Virginia, USA, 71–81.

[106]

Lozano-Tello,

and Gómez-Pérez,

(2004) Ontometric: A method to choose the appropriate ontology.

J. Database Manag.

, 2, 1–18.

[107]

Fernández,

, Overbeeke,

and Sabou,

, Motta,

(2009) What makes a good ontology? a case-study in fine-grained knowledge reuse. In:

Asian Semantic Web Conference

, Springer, Bangkok, Thailand, 61–75.

[108]

Gangemi,

, Catenacci,

, Ciaramita,

et al. (2006) Modelling ontology evaluation and validation, In: European Semantic Web Conference, Springer, 140–154.

[109]

Alani,

and Brewster,

(2006) Metrics for ranking ontologies. 4273, 1–15.

[110]

Guarino,

and Welty,

(2004) An overview of Ontoclean. In: Staab, Studer

(eds).

Handbook on Ontologies.

Springer, Berlin, Heidelberg.

[111]

Bloehdorn,

, Cimiano,

and Hotho,

(2006) Learning ontologies to improve text clustering and classification. In:

From Data and Information Analysis to Knowledge Engineering

. Springer, Magdeburg, Germany, 334–341.

[112]

Dollah,

R.B.

and Aono,

(2011) Ontology based approach for classifying biomedical text abstracts.

Int. J. Data Eng.

, 2, 1–15.

[113]

Bloehdorn,

and Hotho,

(2009) Ontologies for machine learning. In:

Handbook on Ontologies

. Springer, Berlin, Heidelberg, 637–661.

[114]

Zavitsanos,

, Paliouras,

and Vouros,

(2008) A distributional approach to evaluating ontology learning methods using a gold standard. In:

Third Ontology Learning and Population Workshop, ECAI, Patras, Greece

[115]

Zavitsanos,

, Petridis,

, Paliouras,

et al. (2008) Determining automatically the size of learned ontologies. In:

ECAI

, IOS Press, Patras, Greece, 178, 775–776.

[116]

Cimiano,

, Hotho,

, Stumme,

et al. (2004) Conceptual knowledge processing with formal concept analysis and ontologies. In:

International Conference on Formal Concept Analysis

, Springer, Sydney, NSW, Australia, 189–207.

[117]

Faure,

and Poibeau,

(2000) First experiments of using semantic knowledge learned by asium for information extraction task using intex. In:

Ontology Learning ECAI-2000 Workshop, Citeseer

, IOS Press, Berlin, Germany, 7–12.

[118]

Zhang,

, Wang,

et al. (2016) A new cognitive model for autonomous ontology learning. In:

Intelligent Systems (IS), 2016 IEEE Eighth International Conference on

. IEEE, Sofia, Bulgaria, 259–264.

[119]

Deb,

C.K.

, Marwaha,

, Arora,

and Das,

(2018) A framework for ontology learning from taxonomic data. In:

Big Data Analytics

, Springer, 29–37.

[120]

Staab,

(2005) Learning concept hierarchies from text with a guided agglomerative clustering algorithm. In: Proceedings of the Workshop on Learning and Extending Lexical Ontologies with Machine Learning Methods, Sydney.

[121]

Barbu,

(2015) Property type distribution in wordnet, corpora and wikipedia.

Expert Syst. Appl.

, 42, 3501–3507.

[122]

Bian,

H.-Z.

and Ha,

(2017)

Conceptual extraction of domain knowledge graph in different data sources. In: Conference of DEStech Transactions on Computer Science and Engineering iceit, Zhuhai, China.

[123]

Pereira,

F.C.

, Oliveira,

and Cardoso,

(2000) Extracting concept maps with clouds. In:

Proceedings of the Argentine Symposium of Artificial Intelligence (ASAI)

Buenos Aires, Argentina

[124]

Missikoff,

, Navigli,

and Velardi,

(2002) Integrated approach to web ontology learning and engineering.

Computer

, 35, 60–63.

[125]

Jain,

, Jain,

and Mishra,

(2015) EHCPRS system as an ontology learning system. In: Computing for Sustainable Global Development (INDIACom), 2015 Second International Conference on. IEEE, New Delhi, 978–984.

[126]

Hahn,

and Romacker,

(2000) Content management in the syndikate system– how technical documents are automatically transformed to text knowledge bases.

Data Knowl. Eng.

, 35, 137–159.

[127]

Soma,

(2008)

Applying semantic web technologies for information management in domains with semi-structured data

. University of Southern California.

[128]

Halvorsen,

and Hansen,

B.J.

(2011) Integrating military systems using semantic web technologies and lightweight agents.

FFI-notat

, 1851, 2011.

[129]

Valente,

, Holmes,

and Alvidrez,

F.C.

(2005) Using a military information ontology to build semantic architecture models for airspace systems. In:

Aerospace Conference

, IEEE, Big Sky, MT, USA, 1–7.

[130]

Frantz,

and Franco,

(2005) A semantic web application for the air tasking order.

Technical report. Air Force Research Lab, Rome–NY Information Directorate

[131]

Lacy,

, Aviles,

, Fraser,

et al. (2005) Experiences using owl in military applications. In:

OWLED

, CEUR-WS, Galway, Ireland, 188.

[132]

Turnitsa,

and Tolk,

(2006)

Battle management language: a triangle with five sides. In:

Proceedings of the Simulation Interoperability Standards Organization (SISO) Spring Simulation Interoperability Workshop (SIW)

, IEEE, Huntsville, AL, USA, 27.

[133]

Rui,

, Nengcheng,

and Zhixue,

(2006) A new approach to a local e-government portal for information management and deep searching.

Wuhan Univ. J. Nat. Sci.

, 11, 1161–1166.

[134]

Haav,

H.-M.

(2011) A practical methodology for development of a network of e-government domain ontologies. In:

Building the e-World Ecosystem

, Springer, Berlin, Heidelberg, 1–13.

[135]

Hepp,

(2008) Goodrelations: an ontology for describing products and services offers on the web.

Knowl. Eng. Pract. Patterns

, 5268, 329–346.

[136]

Akanbi,

A.K.

(2014)

Lb2co: a semantic ontology framework for b2c ecommerce transaction on the internet. In:

International Research Journal of Computer Science

, 4, p. 9,

arXiv preprint arXiv:1401.0943

[137]

Paolucci,

, Sycara,

, Nishimura,

et al. (2003) Toward a semantic web e-commerce. In: Proc. of Sixth Int. Conf. on Business Information Systems (BIS2003) Colorado Springs, USA.

[138]

Ekelhart,

, Fenz,

, Tjoa,

et al. (2007) Security issues for the use of semantic web in e-commerce. In:

Business Information Systems

, Springer, Berlin, Heidelberg, 1–13.

[139]

Podgorelec,

and Pavlic,

(2007) Managing diagnostic process data using semantic web. In:

Computer-Based Medical Systems, 2007. CBMS’07. Twentieth IEEE International Symposium on

, IEEE, Maribor, Slovenia, 127–134.

[140]

Kim,

K.-H.

and Choi,

H.-J.

(2007) Design of a clinical knowledge base for heart disease detection. In:

Computer and Information Technology, 2007. CIT 2007. Seventh IEEE International Conference on

, IEEE, Fukushima, Japan, 610–615.

[141]

Ganguly,

, Chattopadhyay,

, Paramesh,

et al. (2008) An ontology-based framework for managing semantic interoperability issues in e-health. In: e-health Networking, Applications and Services, 2008. HealthCom 2008. Tenth International Conference on, IEEE, Singapore, 73–78.

[142]

Köhler,

, Doelken,

S.C.

, Mungall,

C.J.

et al. (2013) The human phenotype ontology project: linking molecular biology and disease through phenotype data.

Nucleic Acids Res.

, 42, D966–D974.

[143]

Sandun,

, Sumathipala,

and Ganegoda,

G.U.

(2017) Self-evolving disease ontology for medical domain based on web.

Int. J. Fuzzy Logic Intell. Syst.

, 17, 307–314.

[144]

De Silva,

T.S.

, MacDonald,

, Paterson,

et al. (2011) Systematized nomenclature of medicine clinical terms (snomed ct) to represent computed tomography procedures.

Comput. Methods Programs Biomed.

, 101, 324–329.

[145]

Fan,

and Li,

(2006) A hybrid model of image retrieval based on ontology technology and probabilistic ranking. In: Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on, IEEE, Hong Kong, China, 477–480.

[146]

Wang,

, Chia,

L.-T.

and Liu,

(2007) Semantic retrieval with enhanced match-making and multi-modality ontology. In:

Multimedia and Expo, 2007 IEEE International Conference on

, IEEE, Beijing, 516–519.

[147]

Liu,

, Shao,

and Liu,

Ontology-based image retrieval with sift features. In:

Pervasive Computing Signal Processing and Applications (PC-SPA), 2010 First International Conference on

, IEEE, Harbin, 464–467.

[148]

Ballan,

, Bertini,

, Del Bimbo,

and Serra,

(2010) Video annotation and retrieval using ontologies and rule learning.

IEEE MultiMedia

, 17, 80–88.

[149]

Sombatsrisomboon,

, Matsuo,

and Ishizuka,

(2003) Acquisition of hypernyms and hyponyms from the WWW. In: Proceedings of the Second International Workshop on Active Mining, France.

[150]

Onno Paap. (2006) Accelerating Deployment of ISO 15926 (ADI).

Technical report. FIATECH Member Meeting.

[151]

Onno Paap and Fluor Corporation. 2008. ISO 15926 for interoperability. In

W3C Workshop on Semantic Web in Oil & Gas Industry

, Houston, TX, USA.

[152]

Halvorsen,

and Hansen,

B.J.

(2011) Integrating military systems using semantic web technologies and lightweight agents.

FFI-notat

, 1851, 2011.

[153]

Krummenacher,

, Simperl,

, Cerizza,

et al. (2009) Enabling the european patient summary through triplespaces.

Comput. Methods Programs Biomedicine

, 95, S33–S43.

[155]

Rehman,

, Javed,

and Babri,

H.A.

(2017) Feature selection based on a normalized difference measure for text classification.

Inform. Process. Manag.

, 53, 473–489.