CFP last date
15 April 2024
Reseach Article

Data Mining in Clinical Data Sets: A Review

by Shomona Gracia Jacob, R Geetha Ramani
International Journal of Applied Information Systems
Foundation of Computer Science (FCS), NY, USA
Volume 4 - Number 6
Year of Publication: 2012
Authors: Shomona Gracia Jacob, R Geetha Ramani

Shomona Gracia Jacob, R Geetha Ramani . Data Mining in Clinical Data Sets: A Review. International Journal of Applied Information Systems. 4, 6 ( December 2012), 15-26. DOI=10.5120/ijais12-450774

@article{ 10.5120/ijais12-450774,
author = { Shomona Gracia Jacob, R Geetha Ramani },
title = { Data Mining in Clinical Data Sets: A Review },
journal = { International Journal of Applied Information Systems },
issue_date = { December 2012 },
volume = { 4 },
number = { 6 },
month = { December },
year = { 2012 },
issn = { 2249-0868 },
pages = { 15-26 },
numpages = {9},
url = { },
doi = { 10.5120/ijais12-450774 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
%0 Journal Article
%1 2023-07-05T10:47:31.222208+05:30
%A Shomona Gracia Jacob
%A R Geetha Ramani
%T Data Mining in Clinical Data Sets: A Review
%J International Journal of Applied Information Systems
%@ 2249-0868
%V 4
%N 6
%P 15-26
%D 2012
%I Foundation of Computer Science (FCS), NY, USA

Data mining is one of the extensively researched areas in computer science and information technology owing to the wide influence exhibited by this computational technique on diverse fields that include finance, clinical research, multimedia, education and the like. Adequate survey and literature has been devoted to Clinical data mining, an active interdisciplinary area of research that is considered the consequent of applying artificial intelligence and data mining concepts to the field of medicine and health care. The aim of this research work is to provide a review on the foundation principles of mining clinical datasets, and present the findings and results of past researches on utilizing data mining techniques to mine health care data and patient records. The scope of this article is to present a brief report on preceding investigations made in the sphere of mining clinical data, the techniques applied and the conclusions recounted. Albeit extensive research has led to remarkable advancement in the field of clinical data mining and has paved the way for incredible enhancements in medical practice, the most recent research findings that can further unveil the potential of data mining in the realm of health care and medicine are clearly presented in this review.

  1. Ian H. Witten; Eibe Frank; Mark A. Hall, "Data Mining: Practical Machine Learning Tools and Techniques" (3 Ed. ). Elsevier. ISBN 978-0-12-374856-0
  2. Cabena, Peter, Pablo Hadjnian, Rolf Stadler, Jaap Verhees and Alessandro Zanasi (1997). "Discovering Data Mining: From Concept to Implementation" Prentice Hall, ISBN 0-13-743980-6.
  3. Xingquan Zhu, Ian Davidson (2007). "Knowledge Discovery and Data Mining: Challenges and Realities. " Hershey, New York. p. 18. ISBN 978-1-59904-252-7.
  4. Debahuti Mishra , Asit Kumar Das, Mausumi and Sashikala Mishra, "Predictive Data Mining: Promising Future and Applications", Int. J. of Computer and Communication Technology, Vol. 2, No. 1, 2010
  5. Dave Smith, SAS, Marlow, UK, "Data Mining in the Clinical Research Environment", PhUSE 2007.
  6. Prasanna Desikan, Hsu, Srivastava, "Data mining for health care management", 2011 SIAM International Conference on Data mining.
  7. Iavindrasana J et. al, Clinical data mining: a review. Med Inform. 2009:121-33. Review.
  8. Fayyad, Usama; Gregory Piatetsky-Shapiro, and Padhraic Smyth (1996). "From Data Mining to Knowledge Discovery in Databases". http://www. kdnuggets. com/gpspubs/aimag-kdd-overview-1996-Fayyad. pdf. Retrieved 2008-12-17.
  9. Epstein, Irwin. (2010). Clinical data-mining: Integrating practice and research. London. Oxford University Press
  10. Pang-Ning Tan, Michael Steinbach and Vipin Kumar (2005). Introduction to Data Mining. ISBN 0-321-32136-7
  11. John F. Roddick, Peter Fule, Warwick J. Graco, "Exploratory Medical Knowledge Discovery: Experiences and Issues", 2004.
  12. David Hanauer, MD, MS Mining clinical electronic data for research and patient care: Challenges and solutions, Clinical Assistant Professor University of Michigan, USA, 2007 September
  13. R. Agrawal et al. , Fast discovery of association rules, in Advances in knowledge discovery and data mining pp. 307–328, MIT Press, 1996.
  14. Bennett CC and TW Doub. (2010) "Data mining and electronic health records: Selecting optimal clinical treatments in practice". Proceedings of the 6th International Conference on Data Mining. pp. 313-318.
  15. M. F. Ochs et al. (eds. ), "Clinical Research Systems and Integration with Medical Systems", Biomedical Informatics for Cancer Research,DOI 10. 1007/978-1-4419-5714-6_2, © Springer Science Business Media, LLC 2010
  16. Medline Resources http://www. nlm. nih. gov/bsd/pmresources. html
  17. Lalayants et. al, "Clinical data-mining: Learning from practice in international settings", International Social Work March 27, 2012, doi: 0020872811435370
  18. Jerome Beker, Anthony J Grasso Dsw, Irwin Epstein, Boysville Of Michigan, Information Systems in Child, Youth, and Family Agencies, Published October 11th 1993 by CRC Press
  19. Irwin Epstein, Susan Blumenfield, Clinical Data-Mining in Practice-Based Research, May 7th 2002 by Routledge
  20. Irwin Epstein, Ken Peake, Daniel Medeiros, Clinical and Research Uses of an Adolescent Mental Health Intake Questionnaire, August 14th 2005 by Routledge
  21. Gregory Piatetsky-Shapiro, Pablo Tamayo, "Microarray Data Mining: Facing the Challenges" SIGKDD Explorations. Volume 5, Issue 2.
  22. Weiss and Indurkhya. Predictive Data Mining. Morgan Kaufmann
  23. Riccardo Bellazzi, Blaz Zupanb, Predictive data mining in clinical medicine: Current issues and guidelines"", international journal of medical informatics 7 7 (2 0 0 8) 81–97.
  24. G. Bontempi. "Structural feature selection for wrapper methods". In Proceedings of ESANN 2005, European Symposium on Artificial Neural Networks, 2005.
  25. Jiang et. al, Feature Mining Paradigms for Scientific Data, Copyright © by SIAM
  26. Archana Venkataraman, Marek Kubicki, Carl-Fredrik Westin, Polina Golland, "Robust Feature Selection in Resting-State fMRI Connectivity Based on Population Studies", 978-1-4244-7028-0/10/$26. 00 ©2010 IEEE
  27. M. Sacha. (2008) "Clustering of a periodical medical Knowledge -Constrained K-means Clustering with Background data. " in Proceedings of the Eighteenth http://www. mareksacha. com/blog/clustering-of-an- International Conference on Machine Learning, 2001, a periodical-medical-data. pp. 577 - 584.
  28. G. Y. Hang, D. Zhang, J. Ren, and C. Hu, "A Machine Learning Repository: Hierarchical Clustering Algorithm Based on K-Means http://archive. ics. uci. edu/ml/support/Zoo with Constraints," in Fourth International Conference on Innovative Computing, Information and Control, Kaohsiung, Taiwan, 2009, pp. 1479-1482
  29. Lin W. and C. Le "Model-based cluster analysis of microarray gene expression data". Genome Biology, 3(2): research0009. 1-0009. 8, (2002).
  30. Ritu Chauhan, Harleen Kaur, M. Afshar Alam, "Data Clustering Method for Discovering Clusters in Spatial Cancer Databases", International Journal of Computer Applications (0975 – 8887) Volume 10– No. 6, November 2010
  31. V. Elango, R. Subramanian,V. Vasudevan, "A Five Step Procedure for Outlier Analysis in Data Mining" , European Journal of Scientific Research, ISSN 1450-216X Vol. 75 No. 3 (2012), pp. 327-339.
  32. Barnett, V. and Lewis, T. : 1994, Outliers in Statistical Data. John Wiley & Sons. 3rd edition.
  33. Zalizah Awang Long , Abdul Razak Hamdan and Azuraliza Abu Bakar , "Framework on Outlier Sequential patterns for Outbreak Detection", 2009 International Conference on Computer Engineering and Applications, IPCSIT vol. 2 (2011) © (2011) IACSIT Press, Singapore
  34. Balakrishnan, N. ; Childs, A. (2001), "Outlier", in Hazewinkel, Michiel, Encyclopedia of Mathematics, Springer, ISBN 978-1-55608-010-4
  35. Chandola V, Banerjee A, Kumar V. "Anomaly Detection - A Survey". 41(3):2009. ACM Computing Surveys.
  36. Valko M, et al. "Conditional anomaly detection methods for patient-management alert systems". ICML Workshop on Machine Learning in Health Care Applications. 2008
  37. Mrs. Shomona Gracia Jacob and Dr. R. Geetha Ramani,"Discovery of Knowledge Patterns in Clinical Data through Data Mining Algorithms: Multi-class Categorization of Breast Tissue Data", International Journal of Computer Applications (IJCA), 32(7): 46-53, October 2011a DOI: 10. 5120/3920-5521. Published by Foundation of Computer Science, New York, USA.
  38. Shomona Gracia Jacob, Dr. R. Geetha Ramani, Nancy . P (2012), "Efficient Classifier for Classification of Hepatitis C Virus Clinical Data through Data Mining Algorithms and Techniques", Proceedings of the International Conference on Computer Applications, Pondicherry, India, January 27-31, 2012,Techno Forum Group, India. ISBN: 978-81- 920575-8-3: DOI: 10. 73445/ISBN_0768, ACM#. dber. imera. 10. 73445.
  39. Shomona Gracia Jacob, Dr. R. Geetha Ramani, P. Nancy (2011 b), "Feature Selection and Classification in Breast Cancer Datasets through Data Mining Algorithms", Proceedings of the IEEE International Conference on Computational Intelligence and Computing Research (ICCIC'2011), Kanyakumari, India,, IEEE Catalog Number: CFP1120J-PRT, ISBN: 978-1-61284-766-5. Pp. 661-667
  40. Shomona G J, R. Geetha Ramani, "Evolving Efficient Classification Rules from Cardiotocography data through Data mining methods and Techniques", Vol. 78, Issue. 3 PP. 668-680.
  41. Shomona G. J. R. Geetha Ramani, "Mining of Classification Patterns in Clinical data through data mining methods and techniques", Proceedings of the International Conference on Systemics, Cybernetics and Informatics, Held at Chennai, India, during August 3-5, 2012, pp. 997-1003
  42. Serhat Özekes And A. Yilmaz Çamurcu, "Classification And Prediction In A Data Mining Application", Journal Of Marmara For Pure And Applied Sciences, 18 (2002) 159-174Marmara University, Printed In Turkey
  43. Khan J. et al. "Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks". Nature Medicine, Volume 7, Number 6, June 2001.
  44. K. Ramesh Kumar, "Extracting Association Rules from HIV Infected Patient's treatment dataset", Trends in Bioinformatics 4(1):35-46, 2011, ISSN 1994-7941 / DOI:10. 3923/tb. 2011. 35. 46
  45. Roberto J Bayardo, Rakesh Agarwal, "Mining the Most Interesting Rules", Appears in Proc. of the Fifth ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, 145-154, 1999.
  46. Maragatham G et al. ,"A Recent Review on Association Rule Mining", Indian Journal of Computer Science and Engineering (IJCSE), ISSN : 0976-5166 Vol. 2 No. 6 Dec 2011-Jan 2012 832-835.
  47. J. Han and M. Kamber, Data Mining; Concepts and Techniques, Morgan Kaufmann Publishers, 2000
  48. Prather et. al. "Medical Data Mining: Knowledge Discovery in a Clinical Data Warehouse", 1091-8280/97/$5. 00 0 (1997) AMIA, Inc.
  49. Ted W. Way, Berkman Sahiner, Lubomir M. Hadjiiski, and Heang-Ping Chan, "Effect of finite sample size on feature selection and classification: A simulation study", Med Phys. 2010 February; 37(2): 907–920.
  50. Chih Lee, Brittany Nkounkou, and Chun-Hsi Huang, "Comparison of LDA and SPRT on Clinical Dataset Classifications", Biomed Inform Insights. 2011 April 19; 4: 1–7. doi: 10. 4137/BII. S6935
  51. Shomona Gracia Jacob, Dr. R. Geetha Ramani, Nancy. P, „Discovery of Knowledge Patterns in Lymphographic Clinical Data through data mining methods and Techniques", International Conference on Artificial Intelligence, Soft Computing and Application (AIAA, 2012), Held at Chennai, India, July 13th -15th 2012, Advances in Computing and Information Technology, AISC 178,pp. 129-140. Springer Proceedings
  52. University of California, Irvine, Machine Learning Repository, www. ics. uci. edu/~mlearn/.
  53. Shomona Gracia Jacob, Dr. R. Geetha Ramani, Nancy. P, „Classification of Splice Junction DNA sequence data through Data mining techniques", ICFCCT, 2012, held at Beijing, China, May 19-20, 2012. Pp. 143-148, ISBN:978-988-15121-4-7.
  54. Leonid churilov , Adyl Bagirov , Daniel Schwartz , Kate Smith , Michael Dally ,, "Data mining with combined use of optimization Techniques and Self-Organizing Maps for Improving Risk Grouping Rules: Application to Prostate Cancer Patients", Journal of Management Information Systems Issue: Volume 21, Number 4 / Spring 2005, Pages: 85 - 100
  55. Shomona Gracia Jacob, Dr. R. Geetha Ramani, "Evolving Efficient Classification Rules from Cardiotocography data through Data mining Techniques", European Journal of Scientific Research,June, 2012 Vol. 78, Issue 3, pp. 468-480. (SNIP:0. 010:SJR:0. 071)
  56. Bernard Rosner, "Percentage Points for a Generalized ESD Many-Outlier procedure", Technometrics, Vol. 25, No. 2, May, 1983.
  57. Hawkins, D. M. , (1980) "The Identification of Outliers", Chapman and Hall, London.
  58. Rich Caruana, Alexandru NiculescuMizil, "Data Mining in Metric Space: An Empirical Analysis of Supervised Learning Performance Criteria", KDD'04, August 22–25, 2004, Seattle, Washington, USA. Copyright 2004 ACM 1581138881/04/0008
  59. D. V. Chandra Shekar and V. Sesha Srinivas, "Clinical Data Mining – An Approach for Identification of Refractive Errors", Proceedings of the International MultiConference of Engineers and Computer Scientists 2008 Vol I, IMECS 2008, 19-21 March, 2008, Hong Kong
  60. Hampel. F. , "A General Quanlitative definition of Robustness, Ann. Math. Statist, 42, 1887-1896. , 1971.
  61. Hampel. F, "The influence curve and its role in robust estimation", JASA 69, 383-393.
  62. ExarchosT. P. Papaloukas, C. ; Fotiadis, D. I. ; Michalis,L. K. , "An association rule mining-based methodology for automated detection of ischemic ECG beats", IEEE Transactions on Biomedical Engineering. Volume: 53 , Issue: 8 Page(s): 1531 - 1540
  63. A. Vararuk, I. Petrounias, V. Kodogiannis, (2007) "Data mining techniques for HIV/AIDS data management in Thailand", Journal of Enterprise Information Management, Vol. 21 Iss: 1, pp. 52 – 70
  64. Ordonez. C, et. al, "Mining constrained association rules to predict Heart disease", IEEE International Conference on Data Mining, pp. 433-440.
  65. W. H. L. Jr, T. Masters, J. Y. Lo, D. W. McKee, and F. R. Anderson. "New results in breast cancer classification obtained from an evolutionary computation/adaptive boosting hybrid using mammogram and history data". In 2001 IEEE Mountain Workshop on Soft Computing in Industrial Applications, pages 47–52, 2001.
  66. Rules for Melanoma skin cancer diagnosis, URL = http://www. phys. uni. torun. pl/publications/kmk/
  67. Nassif. et. al, "Information Extraction for Clinical Data Mining: A Mammography Case Study" Appears in Proceedings of the 2009 IEEE International Conference on Data Mining Workshops.
  68. Akinori Abe, Norihiro Hagita, Michiko Furutani, Yoshiyuki Furutani and Rumiko Matsuoka, "Categorized and Integrated Data Mining of Medical Data, Communications And Discoveries From Multidisciplinary Data", Studies in Computational Intelligence, 2008, Volume 123/2008, 315-330, DOI: 10. 1007/978-3-540-78733-4_19.
  69. Sarabjot S. Anand, David A. Bell, John G. Hughes, "EDM: A general framework for Data Mining based on Evidence Theory, Data & Knowledge Engineering" Volume 18, Issue 3, April 1996, Pages 189–223
  70. Jau-Huei Lin, M. D. and Peter J. Haug, M. D. "Data Preparation Framework for Pre-processing Clinical Data in Data Mining", AMIA Annu Symp Proc. 2006; 2006: 489–493.
  71. Primary Children's Medical Center. Salt Lake City, UT, USA. (Personal Communication).
  72. Clayton PD, Narus SP, Huff SM, Pryor TA, Haug PJ, Larkin T, Matney S, Evans RS, Rocha BH, Bowes 3rd WA, Hoston FT, Gundersen ML. "Building a comprehensive clinical information system from components. The approach at Intermountain Health Care". Methods INF Med 2003; 42:1-7.
  73. S. Y. Hwang, C. P. Wei, and W. S. Yang, "Process mining: Discovery of temporal patterns from process instances," Computers in Industry, vol. 53 no. 3 pp. 345-364, 2004.
  74. Carolyn McGregor, Christina Catley and Andrew James, "A Process Mining Driven Framework for Clinical Guideline Improvement in Critical Care" 13th Conference on Artificial Intelligence in Medicine, Bled, Slovenia, July,2011.
  75. Chapman, P. , Clinton, J. , Kerber, R. , Khabaza, T. , Reinartz, T. , Shearer, C. CRISP-DM 1. 0,http://www. crisp-dm. org/download. htm
  76. Shearer, C. "The CRISP-DM Model: The New Blueprint for Data Mining". J. Data Warehousing. 5, 13-22 (2000)
  77. Catley, C. , Smith, K. , McGregor, C. , Tracy, M. "Extending CRISP-DM to Incorporate Temporal Data Mining of Multidimensional Data Streams: a Neonatal Intensive Care Unit Case study. " In: Computer-Based Medical Systems, pp. 1-5 (2009)
  78. Reza Sherafat Kazemzadeh, Kamran Sartipi and Priya Jayaratna, "A Framework for Data and Mined Knowledge Interoperability in Clinical Decision Support Systems". Clinical Data and Knowledge Interoperability, 2008.
  79. Health Level-7. URL = http://www. hl7. org. [Online; accessed 1-August-2008].
  80. Data Management Group (DMG). Predictive Model Markup Language (PMML) version 3. 0 specification. URL = http://www. dmg. org/pmml-v3-0. html.
  81. Health Level 7. The Clinical Document Architecture (CDA) standard specification. URL = http://www. hl7. org. [Online; accessed 1-August-2008].
  82. Guideline Interchange Format (GLIF) 3. 5 - technical specification. URL = http://smi-web. stanford. edu/projects/intermed-web/guidelines/GLIF TECH SPEC May 4 2004. pdf. [Online; accessed 1-August-2008], May 2004.
  83. Oakley, S. (1999). "Data mining, distributed networks and the laboratory". Health Management Technology, 20(5), 26-31
  84. Hian Chye Koh and Gerald Tan, "Data Mining Applications in Healthcare", Journal of Healthcare Information Management — Vol. 19, No. 2
  85. Ming Hua a, JianPei b,n, "Clusteringin applications with multiple data sources—A mutual subspace clustering approach", Neurocomputing 92 (2012) 133–144
  86. Schena, M. et al, "Quantitative monitoring of gene expression patterns with a cDNA microarray". Science 270:467-470 (1995).
  87. DeRisi J, et al. "Use of a cDNA microarray to analyze gene expression patterns in human cancer". Nat Genet 1996 Dec; 14(4):457-60.
  88. Hegde P. et al. "A concise guide to cDNA microarray analysis". Biotechniques. 2000 Sep; 29(3):548-50, 552-4, 556.
  89. Marchal K et al "Comparison of different methodologies to identify differentially expressed genes in two-sample cDNA microarrays". JOURNAL OF BIOLOGICAL SYSTEMS 10 (4): 409-430 DEC (2002).
  90. Chipping Forecast 1999, 2002, The Chipping Forecast. Special Supplement. Nature Genet. 21, Jan. 1999.
  91. Baldi P and AD Long. "A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes". Bioinformatics, 17: 509- 519, (2001).
  92. Tusher VG et al. "Significance analysis of microarrays applied to the ionizing radiation response". PNAS, 98:5116-5121, (2001).
  93. Dudoit S et al. "Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments". Statistica Sinica, 12:111-139, (2002).
  94. Ideker, T. et al. "Testing for differentially-expressed genes by maximum likelihood analysis of microarray data". Journal of Computational Biology, 7, 805-817 (2000).
  95. Eisen M. et al. "Cluster analysis and display of genome-wide expression patterns". PNAS, 95:14863-14868 (1998).
  96. Tamayo P. et al. "Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation". PNAS, 96:2907-2912, (1999).
  97. Hastie T. et al. "Supervised harvesting of expression trees". Genome Biology, 2(1):research0003. 1-0003. 12, (2001).
  98. Li H and F. Hong. "Cluster-Rasch models for microarray gene expression data". Genome Biology, 2(8)}:research0031. 1-0031. 13, (2001).
  99. Lin W. and C. Le "Model-based cluster analysis of microarray gene expression data". Genome Biology, 3(2):research0009. 1-0009. 8, (2002).
  100. Golub T. et al. "Molecular classification of cancer: class discovery and class prediction by gene expression monitoring". Science, 286:531-537, 1999
  101. Alizadeh L. et al. "Identification of clinically distinct types of diffuse large B-cell lymphoma based on gene expression patterns". Nature 403: 503-511 (2000).
  102. Bittner M. et al. "Molecular Classification of Cutaneous Malignant Melanoma by Gene Expression Profiling". Nature 406: 536-540 (2000)
  103. Ramaswamy S. et al. "Multi-Class Cancer Diagnosis Using Tumor Gene Expression Signatures", PNAS 98: 15149-15154.
  104. Tibshirani R, et al. "Diagnosis of multiple cancer types by shrunken centroids of gene expression" PNAS 2002 99:6567- 6572 (May 14).
  105. Ramaswamy S. et al. "Evidence for a Molecular Signature of Metastasis in Primary Solid Tumors". Nature Genetics, vol. 33,January 2003, pp. 49-54.
  106. Milley, A. (2000). "Healthcare and data mining" Health Management Technology, 21(8), 44-47.
  107. Kolar, H. R. (2001). "Caring for healthcare". Health Management Technology, 22(4), 46-47.
  108. Cios, K. J. & Moore, G. W. (2002). "Uniqueness of medical data mining". Artificial Intelligence in Medicine, 26(1), 1-24.
  109. Ceusters, W. (2001). "Medical natural language understanding as a supporting technology for data mining in healthcare". In Medical Data Mining and Knowledge Discovery, Cios, K. J. (Ed. ), Physica- Verlag Heidelberg, New York, 41-69.
  110. Megalooikonomou, V. & Herskovits, E. H. (2001). "Mining structure function associations in a brain image database". In Medical Data Mining and Knowledge Discovery, Cios, K. J. (Ed. ), Physica-Verlag Heidelberg, New York, 153-180
Index Terms

Computer Science
Information Sciences


Clinical data mining Outlier Detection Feature Selection Clustering Classification