/
/
/
Thai Text Segmentation Using Vowel-Centered Rules and Learning

Thai Text Segmentation Using Vowel-Centered Rules and Learning

Original Research ArticleMar 30, 2018Vol. 6 No. 2a (2006)

Abstract

A vast majority of text processing algorithms make one common assumption that input text is a sequence of words. In some language in which word boundaries are not always explicit, such as, Thai, text segmentation is an issue of interest. This work presents a two-step algorithm for Thai text segmentation. The first step chops the input text into pieces centered around the vowels. In the second step, the algorithm defines a set of features that might help determine whether or not two consecutive pieces from the previous step belong together as a unit (word, syllable, etc). It then uses learning algorithms to build a model out of these features. Given an input text, applying this model will result in a sequence of units. Each small (few syllables) yet useful enough for further processing by other word-based algorithms.

Keywords: Thai, Text, Segmentation, Learning, Decision Trees, C4.5

Corresponding author: E-mail: patrawadee@as.nida.ac.th

 

How to Cite

Tanawongsuwan*, P. . (2018). Thai Text Segmentation Using Vowel-Centered Rules and Learning. CURRENT APPLIED SCIENCE AND TECHNOLOGY, 305-311.

References

  • Lorchirachoonkul, V. and Khuwinphunt, C. 1981 Thai Soundex Algorithm and Thai-Syllable Seperation Algorithm. Research paper, National Institute of Development Administration, Thailand.
  • Sornlertlamvanich, V. 1993 Word Segmentation for Thai in Machine Translation System. Machine Translation, National Electronics and Computer Technology Center, Bangkok. Pp. 50-56.
  • Pooworawan, Y. and Imarom, V. 1986 Thai Syllable Separater by Dictionary. Proceedings 9th National Conference on Electrical Engineering, Khon Kaen, Thailand.
  • Kawtrakul, A. and Thumkanon, C. 1997 A Statistical Approach to Thai Morphological Analyzer, Proceedings 5th Workshop on Very Large Corpora. Beijing.
  • Meknavin, S. Charoenpornsawat, P. and Kijsirikul, B. 1997 Feature-based Thai Word Segmentation. Proceedings Natural Language Proceeding Pacific Rim Symposium, Phuket, Thailand, pp.41-46.

Author Information

Patrawadee Tanawongsuwan*

National Institute of Development Administration, Bangkok, Thailand

About this Article

Journal

Vol. 6 No. 2a (2006)

Type of Manuscript

Original Research Article

Keywords

Thai, Text, Segmentation, Learning, Decision Trees, C4.5

Published

30 March 2018