CFP last date
15 April 2024
Reseach Article

Clustering Gene Expression Data using Quad Tree based Expectation Maximization Approach

by Leela Rani.p, Rajalakshmi.p
International Journal of Applied Information Systems
Foundation of Computer Science (FCS), NY, USA
Volume 2 - Number 8
Year of Publication: 2012
Authors: Leela Rani.p, Rajalakshmi.p
10.5120/ijais12-450399

Leela Rani.p, Rajalakshmi.p . Clustering Gene Expression Data using Quad Tree based Expectation Maximization Approach. International Journal of Applied Information Systems. 2, 8 ( June 2012), 10-13. DOI=10.5120/ijais12-450399

@article{ 10.5120/ijais12-450399,
author = { Leela Rani.p, Rajalakshmi.p },
title = { Clustering Gene Expression Data using Quad Tree based Expectation Maximization Approach },
journal = { International Journal of Applied Information Systems },
issue_date = { June 2012 },
volume = { 2 },
number = { 8 },
month = { June },
year = { 2012 },
issn = { 2249-0868 },
pages = { 10-13 },
numpages = {9},
url = { https://www.ijais.org/archives/volume2/number8/184-0399/ },
doi = { 10.5120/ijais12-450399 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2023-07-05T10:43:56.472923+05:30
%A Leela Rani.p
%A Rajalakshmi.p
%T Clustering Gene Expression Data using Quad Tree based Expectation Maximization Approach
%J International Journal of Applied Information Systems
%@ 2249-0868
%V 2
%N 8
%P 10-13
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In molecular biology, micro arrays are employed in monitoring the expression levels of genes simultaneously. Arrays are used in the domains of gene expression, genome mapping, toxicity, pathogen identification and other biological applications. Clustering is a useful technique for grouping gene expression data. In clustering, similar gene expression data will be grouped together for identifying relationships between the genes. Clustering of gene expression data is a useful tool for identifying co-expressed genes and biologically relevant grouping of genes, which is an important research area in Bioinformatics. In this paper, a Quad Tree based Expectation Maximization (EM) algorithm has been applied for clustering gene expression data. Quad Tree is used to initialize the cluster centroids. With these centroids, EM is used to group the data efficiently. Expectation Maximization is used to compute maximum likelihood estimates given incomplete samples. Silhouette refers to a method of interpretation and validation of clusters. This measure provides a representation of how well each object lies within its cluster. Experimental results have shown that Quad Tree based Expectation Maximization algorithm finds compact clusters when compared to K-Means algorithm.

References
  1. Bashar Al-Shboul and Sung-Hyon Myaeng, "Initializing K-Means using Genetic Algorithms", World Academy of Science, Engineering and Technology 54, 2009.
  2. P. S. Bishnu and V. Bhattacherjee, "A New Initialization method for K-Means using Quad Tree," Proc of National. conf. on Methods and Models in Computing, JNU, New Delhi, pp. 73-81, 2008.
  3. T. Chandrasekhar, K. Thangavel and E. Elayaraja, "Performance Analysis of Enhanced Clustering Algorithm for Gene Expression data", IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 6, No 3, November 2011.
  4. Dempster, A. , Laird, N. , and Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B,39(1):1–38.
  5. J. Han and M. Kamber , Data mining Concepts and techniques, 2nd edition, Morgan Kaufmann Publishers, pp. 401-404, 2007.
  6. Moh'd Belal Al- Zoubi and Mohammad al Rawi, "An Efficient Approach for Computing Silhouette Coefficients". Journal of Computer Science 4 (3): 252-255, 2008.
  7. G. Nathiya, S. C. Punitha, M. Punithavalli, "An Analytical Study on Behavior of Clusters Using K-Means, EM and K* Means Algorithm", (IJCSIS) International Journal of Computer Science and Information Security,Vol. 7, No 3, 2010.
  8. Osama Abu Abbas, Computer Science Department, Yarmouk University, Jordan, "Comparisons between data clustering algorithms" The international Arab Journal of Information Technology,Vol. 5,No. 3,July 2008.
  9. Sunnyvale, Schena M," Microarray biochip technology". , CA: Eaton Publishing; 2000.
  10. Vishwanath R. Iyer, Michael B. Eisen, Douglas T. Ross, Greg Schuler, Troy Moore, Jeffrey C. F. Lee, Jeffrey M. Trent, Louis M. Staudt, James Hudson Jr. , Mark S. Boguski, Deval Lashkari, Dari Shalon, David Botstein, and Patrick O. Brown, "The Transcriptional Program in the Response of Human Fibroblasts to Serum", www. sciencemag. org, Science Vol. 283,1 January 1999.
Index Terms

Computer Science
Information Sciences

Keywords

Clustering Quad Tree Expectation Maximization Algorithm K-means Silhouette Measure Similarity