Improving Enterprise Search in the Upstream Oil and Gas Industry by Automatic Query Expansion using a Non-probabilistic Knowledge Representation

Paul H. Cleverley Published in Information Sciences

Year of Publication 2012
    	author = "Paul H. Cleverley",
    	title = "Article: Improving Enterprise Search in the Upstream Oil and Gas Industry by Automatic Query Expansion using a Non-probabilistic Knowledge Representation",
    	journal = "International Journal of Applied Information Systems",
    	year = 2012,
    	volume = 1,
    	number = 1,
    	pages = "25-32",
    	month = "November",
    	note = "Published by Foundation of Computer Science, New York, USA"


Organizations face a vocabulary disconnect between the terminology people use in search and the inherent ambiguity of terminology in their information. The mismatch leads to critical information being missed. This paper discusses how Boolean keyword search, the most commonly used approach in Enterprise search, compares with automatic Query Expansion (QE) using a non-probabilistic Knowledge Representation (KR) created independently of the corpus.

The tests focused on the initial search results list. Optional recommendation or ‘what’s related’ options or facets were out of scope. Testing was performed on a globally created document library collection from one of the largest corporations in the world. QE recalled, on average, an additional 43% of relevant precise results in a single search, without a commensurate cost to information precision.

It is well known from set theory as more words are used in a keyword search, using an AND operator, fewer results are returned. However, it was observed as more words are used in a keyword only search, the relevant results returned, as a proportion of all relevant results in the corpus, decreases. This narrow search paradox means in general terms, when more search words are used in a query to help locate relevant information, as a proportion, more information of relevance is actually missed. This is caused by the compounding of words’ semantic fields and possible linguistic variants.It is believed this is the first time the effect has been modeled in this context, with wider significance in Information Retrieval (IR).


Enterprise search, Web Digital Library, Query Expansion, Knowledge Representation, Information Retrieval, Semantic Ambiguity, Taxonomy, Ontology, Combinatorial Linguistic Explosion, Petroleum Exploration and Production (E&P)