file - ChaSen - 奈良先端科学技術大学院大学

More documents

Recommendations

Info

Table 3.9. Question Segmentation with Different Chunk Tag Sets.IO IOB1 IOB2 IOE1 IOE2 IOBESI .76 .74 .14 .73 .11 N/AO .94 .94 .94 .94 .94 .94B - .16 .74 - - .11E - - - .13 .73 .15S - - - - - .72of a question segment also appears lower performance. This kind of tendencyis presented in experimental results of E tag in IOE1 and IOE2. When usingIOBES tag sets, S tag of question segment with no adjacent question segmentshows high F-measure but the performance of I/B/E tags remains lower.3.5 DiscussionThe results of this experiment did not satisfy our expectation. Especially, theperformance of type identification does not far achieve the results in previousstudies regarding to single sentence question. In question extraction, the F-measure indicated about 0.6 at most. But this result does not necessarily leada pessimistic conclusion. For instance, in text summarization many methodologiesof text segmentation based-on topic have been proposed. They comprise thestudies related to documents with certain document styles, such as news paper,minutes of meeting, papers and patents, in which the accuracies of segmentationshows about 0.7-0.8 in most of the cases. In studies aiming at Web pages andspoken language, accuracy of topic segmentation is even lower. The segmentationin this thesis has to perform question type identification in addition to segmentationof question article from the Web. Nevertheless we used only n-gram of partof speeches in this experiment.When failing in question segment extraction, the errors often appear in boundariesof adjacent question segments and in the inside of segments comprisingtwo and more sentences. At the boundaries of adjacent segments, by usingIOB2/IOE2/IOBES tag sets, the enhancement of performance was recognized.42
However when using IOB2/IOE2/IOBES, the performance of labeling the sentencein the inside of a chunk contrarily was declined. The number of this kindof chunks is few in our corpus, the positive examples of this case for machinelearning are considered to be insufficient.There are seventeen question segments comprising multiple sentences in testdataset. The sentence representing question or request appears at the head ofsegment in one case, at the tail of segment in nine cases and at both the headand tail of segment in six cases. One case has no sentence representing questionor request. Those question segments failed to identify the question types. Inevaluation of labeling to sentence, the best result was obtained in IOE1 labelingsuch that four sentences were correctly labeled at 34 heads and tails of sentencesof 17 question segments.This thesis proposed the chunking-based question analysis that performedconcurrently both question segmentation and question type identification, whichaimed at concurrently solving two problems in question analysis. The first problemwas a methodology that can handle more complex queries that comprisemultiple questions or question described by multiple sentences, and the secondproblem is to reduce the computational cost of previous techniques. Proposedmethods can solve these problems in theory, however the accuracies in experimentalresults have not achieved to the practical level yet.The experimental results show the opposite natures to same features in questionsegmentation and question type identification. In general, it should be difficultto reveal such two alien problems in a same computational model. Proposedmethod has not been considered in this aspect of problem. Concurrent processingof question segmentation and question type identification is effective in reductionof computational cost, that however was clarified that does not fit the conditioninvolved different properties of question segmentation and the type identification.Therefore, I am going to change the strategy to that exploiting different modelsfor question segmentation and question type identification in next step, andattempt to reduce the computational cost in such frame work.Another important observation in experimental result is that many errors ofquestion segmentation and type identification occurred in sentences comprisingmany ellipses. That process that identify ellipsis and complete it by any relevant43
Page 1:
NAIST-IS-DD0061208Doctoral Disserta
Page 4 and 5:
This thesis studies two fundamental
Page 6 and 7: F 0.8 F 0.7 , , , , , iv
Page 8 and 9: 3.4.4 Experimental settings . . . .
Page 10 and 11: List of Tables3.1 Definitions of Qu
Page 12 and 13: List of Figures1.1 Division of Quer
Page 15 and 16: Chapter 1Introduction1.1 Motivation
Page 17 and 18: The Number of Question per QuerySin
Page 19: answers. Although different distrib
Page 22 and 23: ComputerHumanComputerHumanSpecializ
Page 24 and 25: Blog PageMultipleSentenceQueryQ1Q2e
Page 26 and 27: Data FlowQuestion TypeIdentificatio
Page 28 and 29: I will introduce fundamental techno
Page 31 and 32: Chapter 3Question Type Identificati
Page 33 and 34: used for question sentence type ide
Page 35 and 36: (1) We pay car commuter employees a
Page 37 and 38: Table 3.2. Classified Given Questio
Page 39 and 40: Figure 3.2. Combinations of Questio
Page 41 and 42: Plants can grow indoors. In additio
Page 43 and 44: this thesis. This Chapter proposes
Page 45 and 46: as Inside/Outside [113, 116] and St
Page 47 and 48: Step 1 Segment a question article i
Page 49 and 50: elements be Θ. Then the degree of
Page 51 and 52: Table 3.5. Transition of Question T
Page 53 and 54: Table 3.6. Summary of Experimental
Page 55: Table 3.8. Results of Chunking Vary
Page 59 and 60: 3.6 Related workIdentification of t
Page 61 and 62: Chapter 4Categorization of Descript
Page 63 and 64: 30, 77, 154] and those asking reput
Page 65 and 66: elements were usually eliminated. H
Page 67 and 68: Table 4.1. The Definitions of Descr
Page 69 and 70: 4.3.3 Overview of datasetsTable 4.2
Page 71 and 72: Table 4.4. Categorization of n Obje
Page 73 and 74: 4.3.6 DiscussionIn the field of nat
Page 75 and 76: 4.4 Description type based answer c
Page 77 and 78: R = |Rc||Ra|(4.10)Varying combinati
Page 79 and 80: Chapter 5Extraction of ProceduralEx
Page 81 and 82: Table 5.2. Domain and Type of List.
Page 83 and 84: Figure 5.2. Collection of Lists fro
Page 85 and 86: Table 5.3. Types of Tags.Tag types
Page 87 and 88: domain with the document set in the
Page 89 and 90: Table 5.6. Result of Close-Domain.C
Page 91 and 92: Table 5.9. Comparison of SVM and De
Page 93: Sentence : “ [menyu] w o s ent a
Page 96 and 97: • It is applicable in case that m
Page 98 and 99: • For extraction of procedural ex
Page 100 and 101: [10] Regina Barzilay. Information F
Page 102 and 103: [29] Yoav Freund and Robert E. Scha
Page 104 and 105: [47] Chiori Hori, Takaaki Hori, Hid
Page 106 and 107:
[63] Mingzhe Jin. Authorship attrib
Page 108 and 109:
[84] Christopher D. Manning and Hin
Page 110 and 111:
of International World Wide Web Con
Page 112 and 113:
[122] Akihiro Shinmori, Manabu Okum
Page 114 and 115:
[141] Akihiro Tamura, Hiroya Takamu
Page 116 and 117:
[161] Yudong Yang and HongJiang Zha
Page 118 and 119:
[187] . n-gram . , Vol. 23,No. 5,
Page 120 and 121:
- (Evaluation) Yes-No Yes-No (Ho
Page 122 and 123:
(Analysis) : (Fact) : (Instance)
Page 125:
List of PublicationJournal Papers[1
show all

file - ChaSen - 奈良先端科学技術大学院大学

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?