I will introduce fundamental technologies of information retrieval and naturallanguage processing.2.3 Fundamental technologies and related workWhen seeking answers in an information source, a QA system does not alwaysexploit the whole articles. Typically, segments that are expected to include theanswer are extracted, while eliminating the parts composed of noise and unnecessaryinformation form the sources. These technologies are called text segmentationor page segmentation [17, 43, 100, 161, 179] or passage retrieval [61]. If thesource contains much noise, as seen in Web documents, text cleaning [148] shouldbe performed prior to segmentation.Although the segmentation scheme is diverse, so-called tables and lists areuseful to find answers. Because they are kinds of summarization of informationsources, it can be expected that they contain answers. Several techniques forfinding tables and lists in a document, table and list detection [4, 93, 164, 165]has been proposed [37, 79, 102, 103]. After segmentation, the text clustering ortext categorization is performed to classify segments by the topic and domain[132, 145, 186]. When necessary, sentence extraction [133] is conducted.The processes mentioned above are often invoked by diverse heuristic rules[41, 94]. There exist approaches, such as wrapper induction, that automaticallyor semi-automatically acquire such kind of rules from the documents [21, 142, 167].The process of extracting questions and answers from a sentence heavily incorporatesvarious techniques and resources of natural language processing. Sentencetype identification [63, 138, 147] and anaphora resolution [35, 39, 52, 53, 64, 65, 98]are often conducted. To extract phrases that could be candidate answers, especiallyin the case of a factoid question, named entity recognition [104, 118, 123,127] or noun phrase analysis [1, 159, 176, 188] would be performed. In this processing,a large electronic thesaurus and dictionaries [27, 54], chunkers [74, 144]and some kind of parsers [76] may be incorporated. Moreover, many kind of miningtechnologies that acquire patterns are used to extract answers from sourcearticles by pattern matching, and obtain knowledge for named entity recognition[62, 74, 75].14
After extraction of answer passages, based on the result of identification ofrelation and similarity between segments, sentences and passages, alignment andorganization are invoked [40, 55, 101, 109, 121, 131, 174].Stochastic machine learning is heavily used as an underlying methodology toexecute the natural language processing shown above [20, 22, 23, 25, 29, 67, 78, 84,153].Recent natural language processing, however, is not yet at such a level analyzethe meaning represented by natural language, so there are still a number of problemhardly solved with natural language processing alone. In addition to naturallanguage, if we could exploit additional information presenting the meaning, suchas semantic annotation, the accuracy and coverage of question-answering wouldbe improved [42]. In the Web information retrieval, the development of a taggingscheme based on a semantic web [12] has proceeded [86].In the QA system dealing with queries that require descriptive answers, manykinds of tagging scheme have been used for acquiring linguistic knowledge exploitedin question analysis, answer extraction, summarization of answers, and soon [10, 16, 45, 46, 56, 57, 85, 110, 140, 143], because linguistic knowledge to identifylogical or rhetorical relations between sentences are necessary. Lately, annotationschemes for spoken language have caught attention of many researchers [49, 180].The design of the annotation scheme should be discussed along with annotationtools and the environment. There have been many studies looking atthe efficiency of making a corpus and sharing knowledge for relevant annotationbetween annotators [50, 97, 173]. Additionally, there is a problem of how to manageannotation results such as disagreement of annotations between annotators[105, 129].15
- Page 1: NAIST-IS-DD0061208Doctoral Disserta
- Page 4 and 5: This thesis studies two fundamental
- Page 6 and 7: F 0.8 F 0.7 , , , , , iv
- Page 8 and 9: 3.4.4 Experimental settings . . . .
- Page 10 and 11: List of Tables3.1 Definitions of Qu
- Page 12 and 13: List of Figures1.1 Division of Quer
- Page 15 and 16: Chapter 1Introduction1.1 Motivation
- Page 17 and 18: The Number of Question per QuerySin
- Page 19: answers. Although different distrib
- Page 22 and 23: ComputerHumanComputerHumanSpecializ
- Page 24 and 25: Blog PageMultipleSentenceQueryQ1Q2e
- Page 26 and 27: Data FlowQuestion TypeIdentificatio
- Page 31 and 32: Chapter 3Question Type Identificati
- Page 33 and 34: used for question sentence type ide
- Page 35 and 36: (1) We pay car commuter employees a
- Page 37 and 38: Table 3.2. Classified Given Questio
- Page 39 and 40: Figure 3.2. Combinations of Questio
- Page 41 and 42: Plants can grow indoors. In additio
- Page 43 and 44: this thesis. This Chapter proposes
- Page 45 and 46: as Inside/Outside [113, 116] and St
- Page 47 and 48: Step 1 Segment a question article i
- Page 49 and 50: elements be Θ. Then the degree of
- Page 51 and 52: Table 3.5. Transition of Question T
- Page 53 and 54: Table 3.6. Summary of Experimental
- Page 55 and 56: Table 3.8. Results of Chunking Vary
- Page 57 and 58: However when using IOB2/IOE2/IOBES,
- Page 59 and 60: 3.6 Related workIdentification of t
- Page 61 and 62: Chapter 4Categorization of Descript
- Page 63 and 64: 30, 77, 154] and those asking reput
- Page 65 and 66: elements were usually eliminated. H
- Page 67 and 68: Table 4.1. The Definitions of Descr
- Page 69 and 70: 4.3.3 Overview of datasetsTable 4.2
- Page 71 and 72: Table 4.4. Categorization of n Obje
- Page 73 and 74: 4.3.6 DiscussionIn the field of nat
- Page 75 and 76: 4.4 Description type based answer c
- Page 77 and 78: R = |Rc||Ra|(4.10)Varying combinati
- Page 79 and 80:
Chapter 5Extraction of ProceduralEx
- Page 81 and 82:
Table 5.2. Domain and Type of List.
- Page 83 and 84:
Figure 5.2. Collection of Lists fro
- Page 85 and 86:
Table 5.3. Types of Tags.Tag types
- Page 87 and 88:
domain with the document set in the
- Page 89 and 90:
Table 5.6. Result of Close-Domain.C
- Page 91 and 92:
Table 5.9. Comparison of SVM and De
- Page 93:
Sentence : “ [menyu] w o s ent a
- Page 96 and 97:
• It is applicable in case that m
- Page 98 and 99:
• For extraction of procedural ex
- Page 100 and 101:
[10] Regina Barzilay. Information F
- Page 102 and 103:
[29] Yoav Freund and Robert E. Scha
- Page 104 and 105:
[47] Chiori Hori, Takaaki Hori, Hid
- Page 106 and 107:
[63] Mingzhe Jin. Authorship attrib
- Page 108 and 109:
[84] Christopher D. Manning and Hin
- Page 110 and 111:
of International World Wide Web Con
- Page 112 and 113:
[122] Akihiro Shinmori, Manabu Okum
- Page 114 and 115:
[141] Akihiro Tamura, Hiroya Takamu
- Page 116 and 117:
[161] Yudong Yang and HongJiang Zha
- Page 118 and 119:
[187] . n-gram . , Vol. 23,No. 5,
- Page 120 and 121:
- (Evaluation) Yes-No Yes-No (Ho
- Page 122 and 123:
(Analysis) : (Fact) : (Instance)
- Page 125:
List of PublicationJournal Papers[1