page. Past questions and the answers are categorized into Life, Hobby, and soforth.I selected 21 categories, and collected 200 latest queries as of July 21,2006 from each category, consequently gathered 4,200 queries in total. The21 categories include gardening, town/local information, healthcare, law, economy,mass media/communication, news, social issues, politics, history, archeology,Japanese language, biology, automobiles, domestic travel, stocks, restaurants/eatinghouses, software/freeware, finance/accounting, side jobs/part-timejobs, and mental health. From the obtained queries, I selected 3,993 queries towhich answers were given and subsequently chose queries including at least twosentences. After confirming the contents and excluding the queries that questionswere indefinite, consequently, I obtained 3,628 queries. We further sampled 2,000queries of the 3,628 queries at random, and created sets of queries for annotation.Besides the 2,000 queries, we used 234 queries that were collected in 2001 forresearch from six categories (gardening, healthcare, economy, sociology, politics,and law) on the same website in the same manner. The data sets thus createdare 5.7 in the average number of sentences per query and 3.9 in deviation. Theaverage length and deviation of a sentence are 73.9 bytes and 51.8 respectively.3.3.2 Overview of question type annotationQuestion types were manually tagged in the ten kinds of question types listed inTable 3.1. The annotators tagged passages considered as necessary to identifyone question and its question type. Consequently, one question was expressedby a set of several text passages. The boundary of tagged passages were allowedto be at any character and not necessarily located to be at the start or end ofa sentence. It was allowed to only assign one question type to one passage. Forthis reason, nonoverlapped passages tagged in different question types could becontained in one sentence. The query was presented to the rater without showingits answer or question title.Tagrin 3 was used as the question type annotation tool [173]. The corpus wasdivided into two and the respective articles were classified by two operators. Furthermore,234 queries collected in 2001 were tagged by another operator besides3 http://kagonma.org/tagrin/docs/main.html22
Table 3.2. Classified Given Question Types.Question-Types Number of PassagesYes-No(Y) 1709 / .43Description(D) 636 / .59Name(N) 454 / .71How-to(W) 325 / .79Reason(R) 304 / .87Location(L) 197 / .92Evaluation(E) 141 / .95Consultation(C) 106 / .98Time(T) 63 / 1.00Oters(OT) 10 / 1.00Total 3945above-mentioned two operators. The question type annotation results were comparedwith those of the other two persons to calculate inter-annotator agreement.3.3.3 Analysis of assigned tagsThe results of question type annotation according to the settings described inSection 3.3.2 are shown in Table 3.2. The right column in the table indicatesthe frequency of tagged passages for each question type where they are arrangedin the descending order of frequency from the top. The adjacent values of eachfrequencies indicate their cumulative ratio of frequencies to the total frequencyof all passages.In total, 3945 passages related to questions and 1252 articles each containingmore than one question item were confirmed, and the number of question itemsper article was 1.77. There were 98 questions where the passage correspondingto one question item was contained in more than one sentence. There were 188sentences each containing more than one question item, accounting for about 5%of all sentences containing question items.23
- Page 1: NAIST-IS-DD0061208Doctoral Disserta
- Page 4 and 5: This thesis studies two fundamental
- Page 6 and 7: F 0.8 F 0.7 , , , , , iv
- Page 8 and 9: 3.4.4 Experimental settings . . . .
- Page 10 and 11: List of Tables3.1 Definitions of Qu
- Page 12 and 13: List of Figures1.1 Division of Quer
- Page 15 and 16: Chapter 1Introduction1.1 Motivation
- Page 17 and 18: The Number of Question per QuerySin
- Page 19: answers. Although different distrib
- Page 22 and 23: ComputerHumanComputerHumanSpecializ
- Page 24 and 25: Blog PageMultipleSentenceQueryQ1Q2e
- Page 26 and 27: Data FlowQuestion TypeIdentificatio
- Page 28 and 29: I will introduce fundamental techno
- Page 31 and 32: Chapter 3Question Type Identificati
- Page 33 and 34: used for question sentence type ide
- Page 35: (1) We pay car commuter employees a
- Page 39 and 40: Figure 3.2. Combinations of Questio
- Page 41 and 42: Plants can grow indoors. In additio
- Page 43 and 44: this thesis. This Chapter proposes
- Page 45 and 46: as Inside/Outside [113, 116] and St
- Page 47 and 48: Step 1 Segment a question article i
- Page 49 and 50: elements be Θ. Then the degree of
- Page 51 and 52: Table 3.5. Transition of Question T
- Page 53 and 54: Table 3.6. Summary of Experimental
- Page 55 and 56: Table 3.8. Results of Chunking Vary
- Page 57 and 58: However when using IOB2/IOE2/IOBES,
- Page 59 and 60: 3.6 Related workIdentification of t
- Page 61 and 62: Chapter 4Categorization of Descript
- Page 63 and 64: 30, 77, 154] and those asking reput
- Page 65 and 66: elements were usually eliminated. H
- Page 67 and 68: Table 4.1. The Definitions of Descr
- Page 69 and 70: 4.3.3 Overview of datasetsTable 4.2
- Page 71 and 72: Table 4.4. Categorization of n Obje
- Page 73 and 74: 4.3.6 DiscussionIn the field of nat
- Page 75 and 76: 4.4 Description type based answer c
- Page 77 and 78: R = |Rc||Ra|(4.10)Varying combinati
- Page 79 and 80: Chapter 5Extraction of ProceduralEx
- Page 81 and 82: Table 5.2. Domain and Type of List.
- Page 83 and 84: Figure 5.2. Collection of Lists fro
- Page 85 and 86: Table 5.3. Types of Tags.Tag types
- Page 87 and 88:
domain with the document set in the
- Page 89 and 90:
Table 5.6. Result of Close-Domain.C
- Page 91 and 92:
Table 5.9. Comparison of SVM and De
- Page 93:
Sentence : “ [menyu] w o s ent a
- Page 96 and 97:
• It is applicable in case that m
- Page 98 and 99:
• For extraction of procedural ex
- Page 100 and 101:
[10] Regina Barzilay. Information F
- Page 102 and 103:
[29] Yoav Freund and Robert E. Scha
- Page 104 and 105:
[47] Chiori Hori, Takaaki Hori, Hid
- Page 106 and 107:
[63] Mingzhe Jin. Authorship attrib
- Page 108 and 109:
[84] Christopher D. Manning and Hin
- Page 110 and 111:
of International World Wide Web Con
- Page 112 and 113:
[122] Akihiro Shinmori, Manabu Okum
- Page 114 and 115:
[141] Akihiro Tamura, Hiroya Takamu
- Page 116 and 117:
[161] Yudong Yang and HongJiang Zha
- Page 118 and 119:
[187] . n-gram . , Vol. 23,No. 5,
- Page 120 and 121:
- (Evaluation) Yes-No Yes-No (Ho
- Page 122 and 123:
(Analysis) : (Fact) : (Instance)
- Page 125:
List of PublicationJournal Papers[1