Data Quality Enhancement for Decision Tree Algorithm using Knowledge-Based Model

Abstract

Data mining is an approach to discovering knowledge or unrevealed patterns from huge data sets by using several methods, such as statistics, machine learning and other data analysis techniques. However, the main limitation of these conventional techniques is the ignorance of data relationships and semantics. The data are considered as meaningless numbers with statistical methods being used for model building. For example, the decision tree, a classification method of data mining, is produced from a given set of labeled data, and those data are classified without understanding the semantics of the data or the relationships between attributes. To understand the inherent meaning in the data and to take advantage of the relationships between data elements, we introduce a knowledge-based approach to improve data quality. The proposed approach uses the ontology as the background knowledge to assist the decision tree classification in the process of data preparation. The ontology is used to infer the relationships between attributes and concepts in an ontology. This relationship information can assist the system in identifying related attributes which could assist in the classification process. Two datasets in different domains; agriculture and economics, were used to evaluate the generalization of the proposed approach. Accuracy was the standard measure of success, and was tested in the evaluation of the model. The experimental results showed that the proposed approach can efficiently enhance the performance of the data classification process.

Keywords: data analytics; data mining; ontology; semantic; classification; decision tree

*Corresponding author: Tel.: +66 81 555 7499

E-mail: kraisakk@nu.ac.th

data analytics; data mining; ontology; semantic; classification; decision tree

How to Cite

Citation Format

Chanmee, S. ., & Kesorn*, K. . (2020). Data Quality Enhancement for Decision Tree Algorithm using Knowledge-Based Model. Current Applied Science and Technology, 259-277.

References

Hand, D. J., 2007. Principles of data mining. Drug-Safety, 30(7), 621-622.
Dou, D., Wang, H. and Liu, H., 2015. Semantic data mining: A survey of ontology-based approaches. Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing. Anaheim, CA, USA, February 7-9, 2015, 244-251.
Anand, S. S., Bell, D. A., and Hughes, J. G., 1995, The Role of Domain Knowledge in Data Mining, Proceedings of the 4th International Conference on Information and Knowledge Management, Baltimore, Maryland, USA, November, 1995, 37-43.
Kuo, Y.-T., Lonie, A., Sonenberg, L. and Paizis, K., 2007. Domain ontology driven data mining: A medical case study. Proceedings of the 2007 International Workshop on Domain Driven Data Mining, San Jose, California, USA, August 12, 2007, 11-17.
Staab, S. and Studer, R., 2009. Handbook on Ontologies. Heidelberg: Springer Science & Business Media.

Author Information

Sirichanya Chanmee

Department of Computer Science and Information Technology, Faculty of Science, Naresuan University, Phitsanulok, Thailand

Kraisak Kesorn*

Department of Computer Science and Information Technology, Faculty of Science, Naresuan University, Phitsanulok, Thailand

About this Article

Journal

Vol. 20 No. 2 (2020)

Type of Manuscript

Original Research Article

Published

23 March 2020

Data Quality Enhancement for Decision Tree Algorithm Using Knowledge-Based Model

Abstract

How to Cite

References

Author Information

Sirichanya Chanmee

Kraisak Kesorn*

About this Article

Journal

Type of Manuscript

Published

Current Journal

Share

Public URL

Search

Latest Articles

Isolation and Characterization of Multifunctional Seed-Borne Endophytic Bacterium Lysinibacillus sphaericus YEBEVIA for Enhancing Maize Growth

Seed Coating with Fungicidal Agents: Enhancing Quality, Storability, and Fusarium sp. Inhibition in Vegetable Soybean Seeds

Unraveling the Molecular Evolution and Structural Landscape of Klebsiella pneumoniae Carbapenemase Variants

Phytoremediation: Stratagem Against Heavy Metal Contamination