Google scholar arxiv informatics ads IJAIS publications are indexed with Google Scholar, NASA ADS, Informatics et. al.

Call for Paper

-

May Edition 2019

International Journal of Applied Information Systems solicits high quality original research papers for the May 2019 Edition of the journal. The last date of research paper submission is April 15, 2019.

Proposing an Improved Semantic and Syntactic Data Quality Mining Method using Clustering and Fuzzy Techniques

Hamid Reza Khosravani Published in Artificial Intelligence

International Journal of Applied Information Systems
Year of Publication 2012
© 2010 by IJAIS Journal
http:/ijais12-450475
Download full text
  1. Hamid Reza Khosravani. Article: Proposing an Improved Semantic and Syntactic Data Quality Mining Method using Clustering and Fuzzy Techniques. International Journal of Applied Information Systems 3(3):8-12, July 2012. BibTeX

    @article{key:article,
    	author = "Hamid Reza Khosravani",
    	title = "Article: Proposing an Improved Semantic and Syntactic Data Quality Mining Method using Clustering and Fuzzy Techniques",
    	journal = "International Journal of Applied Information Systems",
    	year = 2012,
    	volume = 3,
    	number = 3,
    	pages = "8-12",
    	month = "July",
    	note = "Published by Foundation of Computer Science, New York, USA"
    }
    

Abstract

Data quality plays an important role in knowledge discovering process in databases. Researchers have proposed two different approaches for data quality evaluation so far. The first approach is based on statistical methods while the second one uses data mining techniques which caused further improvement in data quality evaluation results through relying on knowledge extracting. Our proposed method in data quality evaluation follows the second approach and focuses on accuracy dimension of data quality evaluation including both syntactic and semantic aspects.

Existing data mining techniques evaluate data quality of relational database records only based on association rules which are extracted from their categorical features. Since in real world, we have data with both categorical and numerical features, the main problem of these methods is that numerical feature of data is ignored. Our proposed method in this paper which relies on records' clustering concept, has overcome the existing methods' problem.

In this method we extract the describing rule for each record's cluster and assign a weight to each field of a record to consider the degree of its importance in data quality evaluation. This method evaluates the data quality in a hierarchical manner based on three defined criteria. The simulation results show that using this new proposed method has improved data quality evaluation of the relational database records in an acceptable manner.

Reference

  1. Partabiyan, J. , Mohsenzadeh, M. 2009. Database quality evaluation using a data mining technique, Science and Research Branch, Islamic Azad University, Tehran, Iran.
  2. Ghazanfari, M. , Alizadeh, S. , and Teymourpour, B. 2008. Data Mining and Knowledge Discovery, Publish Center of Iran University of Science & Technology, Tehran, Iran.
  3. Wang, L. , Teshnehlab, M. , Saffarpour, N. , Afuni, D. 2008. Fuzzy Systems and Fuzzy Control, Publish Center of K. N Toosi university of Technology, Tehran, Iran.
  4. Amir A. , Lipika, D. 2007. A k-mean clustering algorithm for mixed numeric and categorical data, Solid State Physics Laboratory, Timarpur, Delhi India, ScienceDirect.
  5. Amir, A. , Lipika, D. 2007. A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set, Solid State Physics Laboratory, Timarpur, Delhi India, ScienceDirect.
  6. Augustin-Iulian Ionescu, Eugen Dumitrascu, 2004. Database Quality-Some Problems, 7th International Conference on Develpment and Application Systems, Suceava, Romania.
  7. Dharmendra S. , Modha, W. , Spangler, S. 2001. FeatureWeighting in k-Means Clustering , Kluwer Academic Publishers, Netherlands.
  8. Loshin, D. 2006. Monitoring Data Quality Performance Using Data Quality Metrics, Informatica Corporation.
  9. Luebbers, D. , Grimmer, U. , Jarke, M. 2003. Systematic Development of Data Mining-Based Data Quality Tools, Proceedings of the 29th VLDB Conference, Berlin, Germany.
  10. Erhard Rahm, Hong Hai Do, Data Cleaning: Problems and Current Approaches, University of Leipzig, Germany.
  11. Hipp, J. , G¨untzer, U. , Grimmer, U. 2003. Data Quality Mining, 3rd International Conference on Practical Aspects of Knowledge Management.
  12. Dougherty, J. , Kohavi, R. , Sahami, M. 1995. Supervised and Unsupervised Discretization of Continuous Features, Computer Science Department of Stanford University, Proceeding of the 12th International Conference.
  13. Peng, L. , Lei, L. A Review of Missing Data Treatment Methods, Department of Information Systems, Shanghai University of Finance and Economics, Shanghai, China.
  14. Lee. 1999. Fuzzy logic in control systems: Fuzzy logic controller, IEEE Trans Systems.
  15. Pipino, L. L. , Lee, Y. W. , Wang, R. Y. 2002. Data Quality Assessment, Communications of the ACM.
  16. Helfert, M. , An Approach for Information Quality measurement in Data Warehousing, University of St. Gallen (Switzerland).
  17. Ludl, M. C. , Widmer, G. , Relative Unsupervised Discretization for Association Rule Mining , Department of Medical Cybernetics and Artificial Intelligence, University of Vienna.
  18. Scannapieco, M. , Missier, P. , Batini, C. , Data Quality at a Glance, Università di Roma "La Sapienza" , University of Manchester, Dipartimento di Informatica, Sistemistica e Comunicazione.
  19. Mamdani; E. H;"Application of fuzzy logic to approximate reasoning using linguistic synthesis", IEEE Trans on Computers, 2003.
  20. Manoranjan Dash, Huan Liu, Feature Selection for Clustering, National University of Singapore, Singapore.
  21. Ohn Mar San, Van-Nas huynh, Yoshiteru Nakamori, 2004. An alternative extention of the k-means algorithm clustering categorical data, Mathematics and Statistics Department of Co-Operative Degree College Sagaing Myanmar, Japan Advanced Institute of Science and Technology Asahidai Tatsunokuchi Ishikawa Japan.
  22. Vázquez Soler, S. , Yankelevich, D. , Quality Mining: A Data Mining Based Method for Data Quality Evaluation, Pragma Consultores and Departamento de Computación – FCEyN Universidad de Buenos Aires, Argentina.
  23. Zhexue Huang, 1998. Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values, Kluwer Academic Publishers, Netherlands.

Keywords

Data Quality Mining, Association Rules, Categorical Feature, Numerical Feature