Legal Documents Clustering using Latent Dirichlet Allocation
Ravi Kumar V and K Raghuveer. Article: Legal Documents Clustering using Latent Dirichlet Allocation. International Journal of Applied Information Systems 2(6):27-33, May 2012. BibTeX
@article{key:article, author = "Ravi Kumar V and K. Raghuveer", title = "Article: Legal Documents Clustering using Latent Dirichlet Allocation", journal = "International Journal of Applied Information Systems", year = 2012, volume = 2, number = 6, pages = "27-33", month = "May", note = "Published by Foundation of Computer Science, New York, USA" }
Abstract
At present due to the availability of large amount of legal judgments in the digital form creates opportunities and challenges for both the legal community and for information technology researchers. This development needs assistance in organizing, analyzing, retrieving and presenting this content in a helpful and distributed manner. We propose an approach to cluster legal judgments based on the topics obtained from Latent Dirichlet Allocation (LDA) using similarity measure between topics and documents. The developed topic based clustering model is capable of grouping the legal judgments into different clusters in effective manner. As per as our knowledge is concerned this is the first approach to cluster Indian legal judgments using LDA topic model
Reference
- J. Allen, et al. "Topic detection and tracking pilot study final report". In Proc. of the DARPA Broadcast News Transcription and understanding Workshop, 1998.
- Marti Hearst. "Texttiling: Segmenting text into multi-paragraph subtopic passages". Computational Linguistics, 1997, Vol. 23. Pages 33–64.
- M. Utiyama and H. Isahara. "A statistical model for domain-independent text segmentation". In Proc. of the ACL 2001, pages 499–506.
- M. Shafiei and E. Milios. "A statistical model for topic segmentation and clustering". In Proc. of Canadian AI'08.
- D. Beeferman, A. Berger, and J. Lafferty. "A model of lexical attraction and repulsion". In Proc. of the ACL, pages 1997, pages 373–380.
- F. Choi, P. Wiemer-Hastings, and J. Moore. "Latent semantic analysis for text segmentation". In Proc. of EMNLP, 2001, pages 109–117.
- H. Kozima. Text segmentation based on similarity between words full text. In Proc. of the ACL, pages 286–288, 1993.
- H. Kozima and T. Furugori. "Similarity between words computed by spreading activation on an English dictionary". In Proceedings of the ACL, 1993, pages 232–239.
- Wei Xu, Xin Liu and Yihong Gong. "Document Clustering Based On Non-negative Matrix Factorization". In Proc. of SIGIR'03 July 28–August 1, 2003, Toronto, Canada. Pages267-273
- Qiang Lu, William Keenan, Jack G. Conrad and Khalid Al-Kofahi. "Legal Document Clustering with Built-in Topic Segmentation". In Proc. of CIKM'11, October 24–28, 2011, Glasgow, Scotland, UK. Pages 383-392
- Anna Huang. "Similarity Measures for Text Document". In Proc. of NZCSRSC 2008, April 2008, Christchurch, New Zealand.
- M. Saravanan. , B. Ravindran and S. Raman. "Using Legal Ontology for Query Enhancement in Generating a Document Summary". In Proc. of JURIX 2007, 20th International Annual Conference on Legal Knowledge and Information Systems, Leiden, Netherlands, 13-15th Dec 2007. Pages 171-172.
- P. Berkhin. "A survey of clustering data mining techniques". Grouping Multidimensional Data 2006, pages 25–71.
- D. M. Blei, A. Y. Ng, and M. I. Jordan. "Latent Dirichlet allocation". Journal of Machine Learning Research Vol. 3 (2003) 993-1022.
- http://www. keralawyer. com/asp/sub. asp?pageVal=judgements
Keywords
Latent Dirichlet Allocation (lda), Legal Judgments, Documents Clustering, Cosine Similarity