/
/
/
Thai text Transformation for Compression

Thai Text Transformation for Compression

Original Research ArticleNov 12, 2018Vol. 5 No. 1 (2005)

Abstract

The paper presents a new Thai-text transform algorithm to enhance compression using the list of frequently used Thai words/phases. The approach is to increase redundancy in text by encoding it into intermediate form. The encoding scheme uses the list of fixed length codes for frequently used Thai words/phases to substitute words/phases in text with their codes. Algorithm performance is measured in terms of compression ratio. There are three major implementations for experiment. The first is to include all 511 frequently used Thai words/phrases. Therefore, a three-byte code is assigned to each word/phase. The second uses a two-byte code because it concerns with the first 255 most frequently used words/phases. The last concerns the first 109 most frequently used words/phases with one-byte code for each word/phase. An experiment was made using each text and its transformed version as input to standard compression programs. The result shows that the transformed text gives compression ratio significantly better than its original one.

Keywords:  -

Corresponding author: E-mail: cast@kmitl.ac.th

How to Cite

Sermkawinrak, K. ., Intakosum, S. ., & Boonjing, V. . (2018). Thai text Transformation for Compression. CURRENT APPLIED SCIENCE AND TECHNOLOGY, 379-384.

References

  • Burrows M., Wheeler D.J. 1994 A Block-Sorting Lossless Data Compression Algorithm. SRC Research Report 124, Digital Systems Research Center, Palo Alto, CA.
  • www.http://www.arturocampos.com/ac_bwt.html
  • Lerwongrat S. 1997 Text Compression by Sorting Transformation. M.S. Thesis in Computer Science, Faculty of Graduate Studies, Mahidol University.
  • Awan F. and Mukherjee A. 2001 LIPT: A Lossless Text Transform to improve compression. Proceedings of International Conference on Information and Theory, Coding and Computing, IEEE Computer Society, Las Vegas, Nevada.
  • Dissunrat K. 2001 Text Compression with Modified Length Index Preserving Transformation Using Semi-Dynamic and Dynamic Dictionary. M.S. Thesis in Computer Science, Faculty of Graduate Studies, Mahidol University.

Author Information

K. Sermkawinrak

Software Systems Engineering Laboratory, Department of Mathematics and Computer Science, Faculty of Science, King Mongkut’s Institute of Technology Ladkrabang (KMITL), Bangkok, Thailand

S. Intakosum

Software Systems Engineering Laboratory, Department of Mathematics and Computer Science, Faculty of Science, King Mongkut’s Institute of Technology Ladkrabang (KMITL), Bangkok, Thailand

V. Boonjing

Software Systems Engineering Laboratory, Department of Mathematics and Computer Science, Faculty of Science, King Mongkut’s Institute of Technology Ladkrabang (KMITL), Bangkok, Thailand

About this Article

Journal

Vol. 5 No. 1 (2005)

Type of Manuscript

Original Research Article

Keywords

-

Published

12 November 2018