The paper presents a new Thai-text transform algorithm to enhance compression using the list of frequently used Thai words/phases. The approach is to increase redundancy in text by encoding it into intermediate form. The encoding scheme uses the list of fixed length codes for frequently used Thai words/phases to substitute words/phases in text with their codes. Algorithm performance is measured in terms of compression ratio. There are three major implementations for experiment. The first is to include all 511 frequently used Thai words/phrases. Therefore, a three-byte code is assigned to each word/phase. The second uses a two-byte code because it concerns with the first 255 most frequently used words/phases. The last concerns the first 109 most frequently used words/phases with one-byte code for each word/phase. An experiment was made using each text and its transformed version as input to standard compression programs. The result shows that the transformed text gives compression ratio significantly better than its original one.
Keywords: -
Corresponding author: E-mail: cast@kmitl.ac.th
Sermkawinrak, K. ., Intakosum, S. ., & Boonjing, V. . (2018). Thai text Transformation for Compression. CURRENT APPLIED SCIENCE AND TECHNOLOGY, 379-384.
