12.07.2015 Views

file - ChaSen - 奈良先端科学技術大学院大学

file - ChaSen - 奈良先端科学技術大学院大学

file - ChaSen - 奈良先端科学技術大学院大学

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

consisting of a variety of domains.Patterns performed well with mutual information filtering in a data set includingdifferent domains and genres. It appears that N-grams and credible patternsare effective in acquiring the common characteristics of procedural expressionsacross different domains. There is a possibility that the patterns are effectivefor moderate narrowing of the range of answer candidates in the early process ofQA and Web information retrieval. In the Computer domain, categorization performedwell overall in every POS group. That is why it includes many instructiondocuments, for instance software installation, computer settings, online shopping,etc., and those usually use similar and restricted vocabularies. Conversely, theuniformity of procedural expressions in the Computer domain causes poorer performancewhen learning from the documents of the Computer domain than whenlearning from the Others domain. I also often found in their expressions that fora particular class of content word, special characters were adjusted (see Figure5.3).This type of pattern occasionally contributed the correct classification in myexperiment. The movement of the performance of content and function wordalong with the addition of N-grams is notable. It is likely that making use ofthe difference of their movement more directly is useful in the categorization ofprocedural text.By error analysis, the following patterns were obtained: those that reflectedcommon expressions, including the multiple appearance of verbs with a casemarkingparticle wo.This worked well for the case in which the procedural statement partiallyoccupied the items of the list. Where there were fewer characters in a list andfailing POS tagging, pattern mismatch was observed.5.10 SummaryThe present work has demonstrated effective features that can be used to categorizelists in web pages by whether they explain a procedure. I show thatcategorization to extract texts including procedural expressions is different fromtraditional text categorization tasks with respect to the features and behaviors78

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!