CFP last date
15 July 2024
Reseach Article

Performance Analysis of Layered Vector Space Model in Web Information Retrieval

by Jayant Gadge, Suneeta Sane, H.b. Kekre
International Journal of Applied Information Systems
Foundation of Computer Science (FCS), NY, USA
Volume 8 - Number 5
Year of Publication: 2015
Authors: Jayant Gadge, Suneeta Sane, H.b. Kekre

Jayant Gadge, Suneeta Sane, H.b. Kekre . Performance Analysis of Layered Vector Space Model in Web Information Retrieval. International Journal of Applied Information Systems. 8, 5 ( March 2015), 7-15. DOI=10.5120/ijais15-451320

@article{ 10.5120/ijais15-451320,
author = { Jayant Gadge, Suneeta Sane, H.b. Kekre },
title = { Performance Analysis of Layered Vector Space Model in Web Information Retrieval },
journal = { International Journal of Applied Information Systems },
issue_date = { March 2015 },
volume = { 8 },
number = { 5 },
month = { March },
year = { 2015 },
issn = { 2249-0868 },
pages = { 7-15 },
numpages = {9},
url = { },
doi = { 10.5120/ijais15-451320 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
%0 Journal Article
%1 2023-07-05T18:59:03.317470+05:30
%A Jayant Gadge
%A Suneeta Sane
%A H.b. Kekre
%T Performance Analysis of Layered Vector Space Model in Web Information Retrieval
%J International Journal of Applied Information Systems
%@ 2249-0868
%V 8
%N 5
%P 7-15
%D 2015
%I Foundation of Computer Science (FCS), NY, USA

Information on the web is growing exponentially. The unprecedented growth of available information coupled with the vast number of available online activities. It has introduced a new wrinkle to the problem of web search. It is difficult to retrieve relevant information. In this context search engines have become a valuable tool for users to retrieve relevant information. Finding relevant information according to user's need is still a challenge. Various retrieval models have been proposed and empirically validated to find out relevant web pages related to user's queries. The vector space model is one of the extensively used for web information retrieval. But this model ignores the importance of terms with respect to their position while calculating the weight to the terms. In this paper, new approach is proposed and validated based on vector space model, referred as Layered Vector Space model. In Layered Vector Space approach, the importance of terms with respect to their position is considered. The web document is conceptually segmented in N-layers considering the organization of the web document and the weights are assigned to terms appearing in different layers based on their occurrence within the document. The proposed layered vector space approach is compared with other token based similarity measures: vector space model, Jaccard similarity, Dice similarity, Pearson's coefficient and PMI-IR

  1. Srinath Sriniwas, P. C. Bhatt. , Introduction to Web Information Retrieval: A User Perspective, Resonance Resonance, June 2002 ,age 27-38
  2. Anwar A. , Alhenshiri, Web Information Retrieval and Search Engine Techniques, Al-Satil Journal, Page 55-81
  3. Mehran Sahami, Vibhu Mittal, Shumeet Baluja, Henry Rowley, The Happy Searcher: Challenges in Web Information Retrieval, Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
  4. Ricardo Baeza-Yate, Information retrieval in the Web: beyond current search engines, International Journal of Approximate Reasoning, vol. 34, 2003 page 97–104
  5. P. Ravikumar, Ashutosh kumar singh, Web Structure Mining: Exploring Hyperlinks and Algorithms for information Retrieval, American Journal of Applied Science vol. 7(6) 2010, Page 840-845
  6. Elias Iosif and Alexandros Potamianos, Unsupervised Semantic Similarity Computation Between Terms Using Web Documents, IEEE transaction on Knowledge and Data Engineering, vol. 22 no. 11, November 2010 . pp. 1637-1647
  7. Joon Ho Lee, Properties of Extended Boolean models in information Retrieval, Proceeding SIGIR '94, Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, Pages 182-190
  8. Kirk Baker, Singular Value Decomposition Tutorial March 29, 2005, Revised January 14, 2013
  9. Norbert Fuhr, probabilistic model in information retrieval, http://citeseerx. ist. psu. edu/viewdoc/download? doi=10. 1. 1. 88. 9250
  10. Danushka Bollegala, Yutaka Matsuo, and Mitsuru Ishizuka, A Web Search Engine-Based Approach to Measure Semantic Similarity between Words, Transaction on Knowledge and Data Engineering, VOL. 23, NO. 7, JULY 2011
  11. Sapna Chauhan, Pridhi Arora and Pawan Bhadana, Algorithm for Semantic Based Similarity Measure, International Journal of Engineering Science Invention, ISSN (Online):2319–6734, ISSN (Print):2319–6726
  12. Sheetal A. Takale, Sushma S. Nandgaonkar, "Measuring Semantic Similarity between Words Using Web Documents" International Journal of Advanced Computer Science and Applications, ( Vol. 1, No. 4 October, 2010).
  13. George Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross,and Katherine Miller, "Introduction to WordNet: An On-line Lexical Database", International Journal of Lexicography, Vol. 3, No. 4. (21 December 1990) pp. 235-244
  14. UMLS: Unified Medical Language System. Available: http://www. nlm. nih. gov/research/umls/ visited on 30/11/2013
  15. MESH: http://www. nlm. nih. gov/mesh/meshhome. html MeSH visited on 30/11/2013
  16. Hamani and Maamri. R, " Word Semantic Similarity Based on Document's Title", International workshop on database and expert systems applications(DEXA) 2013, (ISSN:1529-4188, Print ISBN:978-0-7695-5070-1), pp. 43-47
  17. Lan Huang, D. Milne, E. Frank and Ian H. Witten, Learning a Concept-based Document Similarity Measure, Journal of the American Society for Information Science and Technology, Volume 63,Issue 8,August 2012, pages 1593–1608.
  18. Danushka, B. ,Yutaka Matsuo and Mitsuru Ishizuka, "Measuring Semantic Similarity between Words Using Web Search Engines", Proceedings of the 16th international conference on World Wide Web Pages 757-766.
  19. Taher H. Haveliwala, Aristides Gionis, Dan Klein and Piotr Indyk, "Evaluating Strategies for Similarity Search on the Web ", WWW2002, May 7–11, 2002, Honolulu, Hawaii, USA, ACM 158113-449-5/02/0005.
  20. Myoung-Cheol and Key-Sun Choib, " A comparison of collocation-based similarity measures in query expansion" Information Processing and Management, Volume 35, Issue 1, January 1999, Pages 19–30
  21. Sung-Hyuk, Cha, " Comprehensive Survey on Distance /Similarity Measures between Probability Density Functions", International Journal of Mathematical models and methods in applied sciences, Issue 4, Volume 1, 2007, PP 300-307
  22. Y. Xiao, W. P. Luk, K. F. Wong and K. L. Kwok, " Using Longest Common Subsequence Matching for Chinese Information Retrieval", Journal of Chinese Language and Computing ( volume 15, no 1), pp 45-51
  23. Wael H. Gomaa and Aly A. Fahmy, "A Survey of Text Similarity Approaches", International Journal of Computer Applications, Volume 68– No. 13, April 2013, 0975 – 8887
  24. http://www-igm. univ-mlv. fr/~lecroq/string/ visited on 22/01/2014
  25. J. French, A. Powell and E. Schulman, "Applications of Approximate word matching in information retrieval", 6th informational conference on information and knowledge management,( Nov. 10-14,1997), pp 9-15.
  26. Aji S, and R. Kaimal, "Document summarization using positive pointwise mutual information", International Journal of Computer Science and Information Technology (IJCSIT) ( Vol 4, No 2, April 2012).
  27. http://pami. uwaterloo. ca/~hammouda/webdata/ visited on 08/10/11
  28. http://wing. comp. nus. edu. sg/downloads/mwc/ visited on 06/12/2012
  29. http://www. cs. cmu. edu/afs/cs/project/theo-20/www/data/bootstrappingIE/7sectors. tar. gz visited 8/10/2013
Index Terms

Computer Science
Information Sciences


vector space model; Dice similarity; Jaccard similarity; Cosine Similarity; Layered vector space model; pearson’s coefficient ; PMI-IR; Similarity measure.