12.07.2015 Views

file - ChaSen - 奈良先端科学技術大学院大学

file - ChaSen - 奈良先端科学技術大学院大学

file - ChaSen - 奈良先端科学技術大学院大学

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

I will introduce fundamental technologies of information retrieval and naturallanguage processing.2.3 Fundamental technologies and related workWhen seeking answers in an information source, a QA system does not alwaysexploit the whole articles. Typically, segments that are expected to include theanswer are extracted, while eliminating the parts composed of noise and unnecessaryinformation form the sources. These technologies are called text segmentationor page segmentation [17, 43, 100, 161, 179] or passage retrieval [61]. If thesource contains much noise, as seen in Web documents, text cleaning [148] shouldbe performed prior to segmentation.Although the segmentation scheme is diverse, so-called tables and lists areuseful to find answers. Because they are kinds of summarization of informationsources, it can be expected that they contain answers. Several techniques forfinding tables and lists in a document, table and list detection [4, 93, 164, 165]has been proposed [37, 79, 102, 103]. After segmentation, the text clustering ortext categorization is performed to classify segments by the topic and domain[132, 145, 186]. When necessary, sentence extraction [133] is conducted.The processes mentioned above are often invoked by diverse heuristic rules[41, 94]. There exist approaches, such as wrapper induction, that automaticallyor semi-automatically acquire such kind of rules from the documents [21, 142, 167].The process of extracting questions and answers from a sentence heavily incorporatesvarious techniques and resources of natural language processing. Sentencetype identification [63, 138, 147] and anaphora resolution [35, 39, 52, 53, 64, 65, 98]are often conducted. To extract phrases that could be candidate answers, especiallyin the case of a factoid question, named entity recognition [104, 118, 123,127] or noun phrase analysis [1, 159, 176, 188] would be performed. In this processing,a large electronic thesaurus and dictionaries [27, 54], chunkers [74, 144]and some kind of parsers [76] may be incorporated. Moreover, many kind of miningtechnologies that acquire patterns are used to extract answers from sourcearticles by pattern matching, and obtain knowledge for named entity recognition[62, 74, 75].14

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!