CFP last date
15 April 2024
Reseach Article

Luppar: Information Retrieval for Closed Text Document Collections

by Fabiano Tavares da Silva, Jose Everardo Bessa Maia
International Journal of Applied Information Systems
Foundation of Computer Science (FCS), NY, USA
Volume 12 - Number 28
Year of Publication: 2020
Authors: Fabiano Tavares da Silva, Jose Everardo Bessa Maia
10.5120/ijais2020451846

Fabiano Tavares da Silva, Jose Everardo Bessa Maia . Luppar: Information Retrieval for Closed Text Document Collections. International Journal of Applied Information Systems. 12, 28 ( March 2020), 1-6. DOI=10.5120/ijais2020451846

@article{ 10.5120/ijais2020451846,
author = { Fabiano Tavares da Silva, Jose Everardo Bessa Maia },
title = { Luppar: Information Retrieval for Closed Text Document Collections },
journal = { International Journal of Applied Information Systems },
issue_date = { March 2020 },
volume = { 12 },
number = { 28 },
month = { March },
year = { 2020 },
issn = { 2249-0868 },
pages = { 1-6 },
numpages = {9},
url = { https://www.ijais.org/archives/volume12/number28/1079-2020451846/ },
doi = { 10.5120/ijais2020451846 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2023-07-05T19:10:17.718555+05:30
%A Fabiano Tavares da Silva
%A Jose Everardo Bessa Maia
%T Luppar: Information Retrieval for Closed Text Document Collections
%J International Journal of Applied Information Systems
%@ 2249-0868
%V 12
%N 28
%P 1-6
%D 2020
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This article presents Luppar, an Information Retrieval tool for closed collections of text documents which uses a local distributional semantic model associated to each corpus. The system performs automatic query expansion using a combination of distributional semantic model and local context analysis and supports relevancy feedback. The performance of the system was evaluated in databases of different domains and presented results equal to or higher than those published in the literature.

References
  1. Gianni Amati and Cornelis Joost Van Rijsbergen. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst., 20(4):357–389, October 2002.
  2. R. Baeza-Yates and B. Ribeiro-Neto. Recuperac¸ ˜ao de Informac¸ ˜ao - 2ed: Conceitos e Tecnologia das M´aquinas de Busca. Bookman Editora, 2013.
  3. Yoshua Bengio, R´ejean Ducharme, Pascal Vincent, and Christian Jauvin. A neural probabilistic language model. Journal of machine learning research, 3(Feb):1137–1155, 2003.
  4. J. Bhogal, A. Macfarlane, and P. Smith. A review of ontology based query expansion. Inf. Process. Manage., 43(4):866–886, July 2007.
  5. Claudio Carpineto and Giovanni Romano. A survey of automatic query expansion in information retrieval. ACM Comput. Surv., 44(1):1:1–1:50, January 2012.
  6. James R Curran and Marc Moens. Improvements in automatic thesaurus extraction. In Proceedings of the ACL-02 workshop on Unsupervised lexical acquisition-Volume 9, pages 59–66. Association for Computational Linguistics, 2002.
  7. Liana Ermakova and Josiane Mothe. Query expansion by local context analysis. In Conference francophone en Recherche d’Information et Applications (CORIA 2016), pages pp–235, 2016.
  8. Zhiguo Gong, Chan Wa Cheang, and U Leong Hou. Web query expansion by wordnet. In International Conference on Database and Expert Systems Applications, pages 166– 175. Springer, 2005.
  9. Zellig S Harris. Distributional structure. Word, 10(2- 3):146–162, 1954.
  10. Seyyed Hadi Hashemi, Charles LA Clarke, Jaap Kamps, Julia Kiseleva, and Ellen M Voorhees. Overview of the trec 2016 contextual suggestion track. In Proceedings of TREC, volume 2016, 2016.
  11. Ming-Hung Hsu, Ming-Feng Tsai, and Hsin-Hsi Chen. Query expansion with conceptnet and wordnet: An intrinsic comparison. In Asia Information Retrieval Symposium, pages 1–13. Springer, 2006.
  12. Thomas K Landauer and Susan T Dumais. A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological review, 104(2):211, 1997.
  13. R´emi Lebret and Ronan Collobert. Rehabilitation of countbased models for word vector representations. In International Conference on Intelligent Text Processing and Computational Linguistics, pages 417–429. Springer, 2015.
  14. Omer Levy and Yoav Goldberg. Linguistic regularities in sparse and explicit word representations. In Proceedings of the eighteenth conference on computational natural language learning, pages 171–180, 2014.
  15. Dekang Lin. Automatic retrieval and clustering of similar words. In Proceedings of the 17th international conference on Computational linguistics-Volume 2, pages 768– 774. Association for Computational Linguistics, 1998.
  16. Will Lowe. Towards a theory of semantic space. In Proceedings of the Annual Meeting of the Cognitive Science Society, volume 23, 2001.
  17. Meili Lu, Xiaobing Sun, ShaoweiWang, David Lo, and Yucong Duan. Query expansion via wordnet for effective code search. In Software Analysis, Evolution and Reengineering (SANER), 2015 IEEE 22nd International Conference on, pages 545–549. IEEE, 2015.
  18. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Sch¨utze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008.
  19. Tomas Mikolov, Greg Corrado, Kai Chen, and Jeffrey Dean. Efficient Estimation ofWord Representations in Vector Space. Proceedings of the International Conference on Learning Representations (ICLR 2013), 2013.
  20. George A. Miller. Wordnet: A lexical database for english. Commun. ACM, 38(11):39–41, November 1995.
  21. Virginia Disc One. Cd-rom from virginia polytechnic institute and state university. Blacksburg, VA, 1990.
  22. Jessie Ooi, Xiuqin Ma, Hongwu Qin, and Siau Chuin Liew. A survey of query expansion, query suggestion and query refinement techniques. 2015 4th International Conference on Software Engineering and Computer Systems, ICSECS 2015: Virtuous Software Solutions for Big Data, pages 112–117, 2015.
  23. Martin F Porter. An algorithm for suffix stripping. Program, 14(3):130–137, 1980.
  24. Stephen Robertson and Hugo Zaragoza. The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retr., 3(4):333–389, April 2009.
  25. G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing. Commun. ACM, 18(11):613–620, November 1975.
  26. Peter D Turney and Patrick Pantel. From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research, 37:141–188, 2010.
  27. Jinxi Xu and W. Bruce Croft. Query expansion using local and global document analysis. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’96, pages 4–11, New York, NY, USA, 1996. ACM.
Index Terms

Computer Science
Information Sciences

Keywords

Information Retrieval Distributional Semantic Model Local Context Analysis Closed Document Collection