CFP last date
15 April 2024
Reseach Article

Stemming Algorithms: A Comparative Study and their Analysis

by Deepika Sharma
International Journal of Applied Information Systems
Foundation of Computer Science (FCS), NY, USA
Volume 4 - Number 3
Year of Publication: 2012
Authors: Deepika Sharma
10.5120/ijais12-450655

Deepika Sharma . Stemming Algorithms: A Comparative Study and their Analysis. International Journal of Applied Information Systems. 4, 3 ( September 2012), 7-12. DOI=10.5120/ijais12-450655

@article{ 10.5120/ijais12-450655,
author = { Deepika Sharma },
title = { Stemming Algorithms: A Comparative Study and their Analysis },
journal = { International Journal of Applied Information Systems },
issue_date = { September 2012 },
volume = { 4 },
number = { 3 },
month = { September },
year = { 2012 },
issn = { 2249-0868 },
pages = { 7-12 },
numpages = {9},
url = { https://www.ijais.org/archives/volume4/number3/279-0655/ },
doi = { 10.5120/ijais12-450655 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2023-07-05T10:47:06.311930+05:30
%A Deepika Sharma
%T Stemming Algorithms: A Comparative Study and their Analysis
%J International Journal of Applied Information Systems
%@ 2249-0868
%V 4
%N 3
%P 7-12
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Stemming is an approach used to reduce a word to its stem or root form and is used widely in information retrieval tasks to increase the recall rate and give us most relevant results. There are number of ways to perform stemming ranging from manual to automatic methods, from language specific to language independent each having its own advantage over the other. This paper represents a comparative study of various available stemming alternatives widely used to enhance the effectiveness and efficiency of information retrieval.

References
  1. WB Frakes, 1992,"Stemming Algorithm ", in "Information Retrieval Data Structures and Algorithm", Chapter 8, page 132-139.
  2. A. Ramanathan and D. Rao, 2003. " A lightweight stemmer for Hindi". In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL), on Computational Linguistics for South Asian Languages (Budapest, Apr. ) Workshop.
  3. J. Savoy 2008. " Searching strategies for the Hungarian language". Inf. Process. Manage. 44, 1, 310–324.
  4. P. McNamee, and J. Mayfield 2004. " Character n-gram tokenization for European language text retrieval", Inf. Retr. 7(1-2), 73–97.
  5. D. W. Oard, G. A. Levow and C. I. Cabezas 2001. CLEF experiments at Maryland:" Statistical stemming and back off translation". In Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation (CLEF), Springer, London, 176–187.
  6. WB Frakes 1984. "Term Conflation for Information Retrieval" in Research and Development in Information Retrieval, ed. C. van Rijsbergen. New York: Cambridge University Press.
  7. WB Frakes 1992 "LATTIS: A Corporate Library and Information System for the UNIX Environment," Proceedings of the National Online Meeting, Medford, N. J. : Learned Information Inc. , 137-42.
  8. M. Hafer and S. Weiss 1974. "Word Segmentation by Letter Successor Varieties," Information Storage and Retrieval, 10, 371-85.
  9. G. Adamson and J. Boreham 1974. "The Use of an Association Measure Based on Character Structure to Identify Semantically Related Pairs of Words and Document Titles," Information Storage and Retrieval, 10, 253-60.
  10. M. F. Porter 1980. "An Algorithm for Suffix Stripping Program", 14(3), 130-37.
  11. J. B. Lovins 1968. "Development of a Stemming Algorithm. " Mechanical Translation and Computational Linguistics, 11(1-2), 22-31.
  12. V. I. Levenstein 1966. Binary codes capable of correcting deletions, insertions and reversals. Commun. ACM 27, 4, 358–368
  13. A. K. Jain, M. N. Murthy, and P. J. Flynn 1999. "Data clustering": A review. ACM Comput. Surv. 31, 3, 264–323.
  14. WB Frakes and C. J. Fox 2003. Strength and similarity of affix removal stemming algorithms. SIGIR.
  15. J. Goldsmith 2001. " Linguistica: Unsupervised learning of the morphology of a natural language". Comput. Linguist. 27, 2, 153–198.
  16. J. Xu and W. B. Croft 1998. " Corpus-based stemming using co occurrence of word variants". ACM Trans. Inf. Syst. 16, 1, 61–81.
  17. M. Bacchin, N. Ferro, and M. Melucci 2005. "A probabilistic model for stemmer generation". Inf. Process. Manage. 41, 1, 121–137.
  18. P. Majumder, M Mitra, S. K. Parui, and G. Kole (ISI), P. Mitra (IIT), and K. K. Dutta. "YASS: Yet another Suffix Stripper", published in ACM Transaction on Information System (TOIS), Volume 25 Issue 4, October 2007, Chapter 18, Page 5-6.
  19. JH Paik, Mandar Mitra, Swapan K. Parui, Kalervo Jarvelin, "GRAS : An effective and efficient stemming algorithm for information retrieval", published in ACM Transaction on Information System (TOIS), Volume 29 Issue 4, December 2011, Chapter 19, page 20-24.
Index Terms

Computer Science
Information Sciences

Keywords

Information Retrieval Stemming Algorithm Conflation Methods