12.07.2015 Views

file - ChaSen - 奈良先端科学技術大学院大学

file - ChaSen - 奈良先端科学技術大学院大学

file - ChaSen - 奈良先端科学技術大学院大学

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Step 1 Segment a question article into sentences. Each segment is terminatedwith a period “.”.Step 2 Carry out chunking by article.Step 3 Extract question segments as chunks, identify the question types, andoutput them in pairs.Chunker divides a sequence of sentences into question segments and otherchunks. A chunk tag is given to each sentence. The chunk tags used are ofthe five types explained in Section 3.4.1, namely IOB1, IOB2, IOE1, IOE2, andIOBES, and the IO-tag that does not distinguish the B/E/S tags from the others.Sentences not involved in the identification of question types are given the O-tag.Those sentences that constitute a question segment are given a tag consisting ofthe combination of one of the letters I, B, E, and S and one of the letters Wand D, thus I-W and B-D for example, to represent the portion in the chunk andthe question type. Figure 3.5 shows an example of composition of chunks usingthe IOB-tags. A chunker learns a chunking model from the pairs of sentencesand their chunk tags in Figure 3.5. To extract question segments from a query,sentence labeling, that labels a chunk tag to a sentence, is firstly performed.Subsequently, sentences labeled same roles such as “-D” and “-W” are chunkedby post-processing. Consequently, a question segment is extracted as a chunk andthe question type is given to the question segment based on the label of chunk.3.4.3 Conditional random fieldThe CRF (Conditional Random Field) is a stochastic model for sets. Combinationsof two random variables to represent the properties of a set are associatedwith each other as a conditional probability [Lafferty 2001]. The CRF supposes arandom field that has the Markov property regarding the elements of a set to beobserved. The advantages of this are as follows: (1) There is no need to assumethe independency of random variables as with those in the Markov model; (2)Since a model is described with conditional random variables, the model parameterscan be estimated without calculating the distribution of random variablesin the condition.33

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!