Document Clustering: A Detailed Review

Neepa Shah, Sunita Mahajan Published in Data Mining

Document clustering is automatic organization of documents into clusters so that documents within a cluster have high similarity in comparison to documents in other clusters. It has been studied intensively becauseof its wide applicability in various areas such as web mining,search engines, and information retrieval. It is measuring similarity between documents and grouping similardocuments together. It providesefficient representation and visualization of thedocuments; thus helps in easy navigation also. In this paper, we have given overview of various document clustering methodsstudied and researched since last few years,starting from basic traditional methods to fuzzy based, genetic, co-clustering, heuristic oriented etc. Also, the document clustering procedure with feature selection process, applications, challenges in document clustering, similarity measures and evaluation of document clustering algorithm is explained.


