This paper describes the classification model development of inbound tourism form in Thailand. The models utilized both labeled and originally unlabeled data sets. The latter data set, which was obtained from the Ministry of Tourism and Sports of Thailand that regularly collects unlabeled data, mandated the synthesis of tourism form labels to be usable for classification. To achieve such a label synthesis, we proposed a cluster-to-class mapping algorithm that consisted of three steps. First, searching the best tourist clustering model among the unlabeled tourist data set by comparing the results of K-means, hierarchical cluster analysis, random clustering, and DBSCAN techniques. Second, mapping the clusters to the classes of the labeled data set based on Euclidean similarity to reveal the tourism form labels for the clusters. Finally, searching the best tourism-form classification model based on the data sets with real and synthesized labels by engaging Naïve Bayes, support vector machine, linear regression, and decision tree techniques. Experimental results show that our algorithm effectively generated the tourism form labels since, when using them, we obtained a neutral network model that was capable of predicting the inbound tourism forms of an unseen tourist data set with an F-measure value as high as 98.99%.
Keywords: tourism form; classification algorithm; clustering algorithm; cluster-to-class mapping
*Corresponding author: Tel: +66(0)2942 8200-45
E-mail: thepparit.b@ku.th