13.07.2015 Views

A corpus-based approach for thai romanization.pdf - NAiST

A corpus-based approach for thai romanization.pdf - NAiST

A corpus-based approach for thai romanization.pdf - NAiST

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1. TCC segmentationA Thai word is segmented using the concept ofThai character cluster(TCC). TCC is anunambiguous group of Thai characters defined bya set of rules. The rules used in TCC are simplycomputerized versions of Royal Institute's rules.TCC represents an inseparable unit of Thaicharacters used in composing words.e.g.ชลบุรี ช | ล | บุ | รีพระนคร พ | ระ | น | ค | ร

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!