12.07.2015 Views

file - ChaSen - 奈良先端科学技術大学院大学

file - ChaSen - 奈良先端科学技術大学院大学

file - ChaSen - 奈良先端科学技術大学院大学

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Features of context for chunking a sentence s5 with B-W tag (window size = 3)Group A:m frequent POSfeature1 feature2 ... feature mGroup B:n POS atend of sentencefeature m+1 ... feature m+nchunk tagss1Commuting by...w1w2...wmw1,m+1...w1,m+nOs2To employees...w1w2...wmw2,m+1...w2,m+nB-Ds3My company ...nilnil...nilw3,m+1...w3,m+nOs4Managers ...nilnil...nilw4,m+1...w4,m+nOs5..w1nil...wmw5,m+1...w5,m+nB-Ws6Do you know...w1nil...wmw6,m+1...w6,m+nI-Ws7If you have ...w1w2...wmw7,m+1...w7,m+nOFigure 3.6. Example of data Format in Learning and Testing of Chunking.features in learning corpus, a thousand of frequent part of speeches are stored.Besides this experiment, the experiment exploiting only several words at beginningand end of sentence are performed. It is the reason why symbols and functionwords such as question mark and auxiliaries at the end of sentence are expectedto be effective for extraction of question segment, and interrogatives at beginningof sentence to work well for question type identification. The number of exploitedpart of speeches at the beginning and end of sentence varied one to five.The chunk tag sets comprising four types mentioned in Section 3.4.1 andIO tag set that does not distinct two adjacent question segments, are used forchunking. As the chunker implementing CRF, I used CRF++ supported byKudo. The learning parameters were set in default values.Features using this experiment only were combinations of part–of–speech(POS). Uni-gram and bi-gram of POS, and n words from beginning or1 http://chasen.org/˜taku/software/CRF++/38

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!