Data Quality Assistance - The Use of Data Mining Algorithms to Enhance Data Quality
Keywords:
KDD, Data Mining, Duplicate, Outlier, Association Rule,Abstract
Large and over the years grown databases are a persistent concern in the field of data quality. Data sets grow over time from multiple sources and various users. Data Quality is one of the key issues that needs to be considered. This paper introduces a further development of an interactive data mining assistance system for ensuring data with high quality. What exactly is data quality? Data Quality in our approach is that the data that need to fulfill special requirements. Therefore, in a first instance, data mining algorithms are used to find outliers and duplicates. In the next step, the data mining assistance system generates rules that describe the whole data set. Furthermore, a rule administration is part of the concept. Interesting rules that have been found within the data set through the application of various data mining techniques are supposed to be added at this stage. The system serves, therefore to store and review rules that can be applied to the decision support system. For generating rules, various algorithms from the field of data mining are used. These rules have to be evaluated by experts to see if they can be applied as a type of suggestion rule to the decision support system.References
El Bekri N., Angele S., Ruckhäberle M., Peinsipp-Byma E., Haelke B. 2015. RecceMan: An Interactive Recognition Assistance For ImageBased Reconnaissance: Synergistic Effects Of Human erception And Computational Methods For Object Recognition, Identification, And Infrastructure Analysis. SPIE Proceedings.
El Bekri N., Peinsipp-Byma E. 2015. An Approach for Min(d) the Quality of Data. The 2015 International Conference on Data Mining (DMIN). 62-64.
Hipp J., Günther U.,Grimmer U. 2001. Data Quality Mining – Making a Virute of Necessity. Data Mining and Knowledge Discovery (DMKD).
Luebbers, D., Grimmer U., Jarke M. 2003. Systematic Development Of Data Mining Based Data Quality Tools. Proceedings Of The 29th International Conference On Very Large Databases. (29): 548-559.
Batini C., Cappiello C., Francalanci C., Maurino A. 2009. Methodologies For Data Quality Assessment And Improvement. ACM Computing Surveys (CSUR). (41): 16 -19
Michalski S., Carbonell, G., Mitchell, M. 2013. Machine learning: An Artificial Intelligence Approach. Springer Science & Business Media.
Wu, L., Kaiser, G., Rudin, C., Anderson, R. 2011. Data Quality Assurance And Performance Measurement Of Data Mining For Preventive Maintenance Of Power Grid. Proceedings of the First International Workshop on Data Mining for Service and Maintenance ACM. 28-32
ISO. 2015. Quality management systems - Fundamentals and vocabulary / International Organization for Standardization. (9000:2015).
Wand Y., Wang Y. 1996. Communications of the ACM. (39): 86-95.
Kriebel, C.H. 1979. Evaluating the Quality Of Information Systems. Design and Implementation of Computer Based Information Systems.
Ballou, D.P., and Pazer, H.L. 1985. Modeling Data And Process Quality In Multi-Input, Multi-Output Information Systems. Manage. Sci. 31. (12) 150-162.
From: http://data.worldbank.org/data-catalog/world-developmentindicators. 2015
Draisbach U., Naumann F. 2011. A Generalization of Blocking and Windowing Algorithms for Duplicate Detection. International Conference on Data and Knowledge Engineering.
Downloads
Published
How to Cite
Issue
Section
License
TRANSFER OF COPYRIGHT AGREEMENT
The manuscript is herewith submitted for publication in the Journal of Telecommunication, Electronic and Computer Engineering (JTEC). It has not been published before, and it is not under consideration for publication in any other journals. It contains no material that is scandalous, obscene, libelous or otherwise contrary to law. When the manuscript is accepted for publication, I, as the author, hereby agree to transfer to JTEC, all rights including those pertaining to electronic forms and transmissions, under existing copyright laws, except for the following, which the author(s) specifically retain(s):
- All proprietary right other than copyright, such as patent rights
- The right to make further copies of all or part of the published article for my use in classroom teaching
- The right to reuse all or part of this manuscript in a compilation of my own works or in a textbook of which I am the author; and
- The right to make copies of the published work for internal distribution within the institution that employs me
I agree that copies made under these circumstances will continue to carry the copyright notice that appears in the original published work. I agree to inform my co-authors, if any, of the above terms. I certify that I have obtained written permission for the use of text, tables, and/or illustrations from any copyrighted source(s), and I agree to supply such written permission(s) to JTEC upon request.