file - ChaSen - 奈良先端科学技術大学院大学

More documents

Recommendations

Info

Table 5.4. Statistics of Data Sets.Proc. Non-Proc. Comp. OthersLists 721 3399 2224 1896Items 4.6 / 2.8 4.9 / 5.7 4.8 / 6.1 4.9 / 4.4Sen. 1.8 / 1.7 1.3 / 0.9 1.5 / 1.1 1.3 / 1.1Char. 40.3 / 48.6 32.6 / 42.4 35.6 / 40.1 32.6 / 48.2which features are effective, and I must use many features in categorization toinvestigate their effectiveness. The dimension of the feature space is relativelyhigh.5.6 Features : sequential patternsSequential pattern mining consists of finding all frequent subsequences, that arecalled sequential patterns, in the database of sequences of literals.Besides conventional pattern matching techniques [38], Apriori [2] and PrefixSpan[62] are examples of sequential pattern mining methods.The Apriori algorithm is one of the most widely used methods, however thereis a great deal of room for improvement in terms of computational cost. ThePrefixSpan algorithm succeed in reducing the cost of computation by performingan operation, called projection, which confines the range of the search to setsof frequent subsequences. Details of the PrefixSpan algorithm are provided inanother paper [62].5.7 Experimental settingsIn the first experiment, to determine the categorization capability of a domain, Iemployed a set of lists in the Computer domain and conducted a cross-validationprocedure. The document set was divided into five subsets of nearly equal size,and five different SVMs, the training sets of four of the subsets, and the remainingone classified for testing. In the second experiment, to determine the categorizationcapability of an open domain, I employed a set of lists from the Others72
domain with the document set in the first experiment. Then, the set of the listsfrom the Others domain was used in the test and the one from the Computerdomain was used in the training, and their training and testing roles were alsoswitched.In both experiments, recall, precision, and, occasionally, F-measurevalue were calculated to evaluate categorization performance. F-measure is calculatedwith precision (P) and recall (R) in formula 5.1 that is same as equation4.8 in Chapter 4.F = 2P R(5.1)P + RThe lists in the experiment were gathered from those marked by the listtags in the pages.To focus on the feasibility of the features in the lists forthe categorization task, the contexts before and after each list are not targeted.Table 5.4 lists four groups divided by procedure and domain into columns, andthe numbers of lists, items, sentences, and characters in each group are in therespective rows. The two values in each cell in Table 5.4 are the mean on the leftand the deviation on the right. I employed Tiny-SVM 2 and a implementation ofPrefixSpan 3 by T. Kudo. To observe the direct effect of the features, the featurevectors were binary, constructed with word N-gram and patterns; polynomialkernel degree d for the SVM was equal to one. Support values for PrefixSpanwere determined in an ad hoc manner to produce a sufficient number of patternsin my experimental conditions.To investigate the effective features for list categorization, feature sets of thelists were divided into five groups (see Table 5.5) with consideration given to thedifference of content word and function words according to my observations (describedin Section 5.4). The values in Table 5.5 indicate the numbers of differencesbetween words in each domain data set.The notation of tags above, such as ‘snp’, follows the categories in Table 5.3.F2 and F3 consist of content words and F4 and F5 consist of function words.F6 was a feature group, which added verbal nouns based on my observations(described in Section 5.4).To observe the performances of SVM, I compared the results of categorizations2 http://chasen.org/˜taku-ku/software/TinySVM/3 http://chasen.org/˜taku-ku/software/prefixspan/73
Page 1:
NAIST-IS-DD0061208Doctoral Disserta
Page 4 and 5:
This thesis studies two fundamental
Page 6 and 7:
F 0.8 F 0.7 , , , , , iv
Page 8 and 9:
3.4.4 Experimental settings . . . .
Page 10 and 11:
List of Tables3.1 Definitions of Qu
Page 12 and 13:
List of Figures1.1 Division of Quer
Page 15 and 16:
Chapter 1Introduction1.1 Motivation
Page 17 and 18:
The Number of Question per QuerySin
Page 19:
answers. Although different distrib
Page 22 and 23:
ComputerHumanComputerHumanSpecializ
Page 24 and 25:
Blog PageMultipleSentenceQueryQ1Q2e
Page 26 and 27:
Data FlowQuestion TypeIdentificatio
Page 28 and 29:
I will introduce fundamental techno
Page 31 and 32:
Chapter 3Question Type Identificati
Page 33 and 34:
used for question sentence type ide
Page 35 and 36: (1) We pay car commuter employees a
Page 37 and 38: Table 3.2. Classified Given Questio
Page 39 and 40: Figure 3.2. Combinations of Questio
Page 41 and 42: Plants can grow indoors. In additio
Page 43 and 44: this thesis. This Chapter proposes
Page 45 and 46: as Inside/Outside [113, 116] and St
Page 47 and 48: Step 1 Segment a question article i
Page 49 and 50: elements be Θ. Then the degree of
Page 51 and 52: Table 3.5. Transition of Question T
Page 53 and 54: Table 3.6. Summary of Experimental
Page 55 and 56: Table 3.8. Results of Chunking Vary
Page 57 and 58: However when using IOB2/IOE2/IOBES,
Page 59 and 60: 3.6 Related workIdentification of t
Page 61 and 62: Chapter 4Categorization of Descript
Page 63 and 64: 30, 77, 154] and those asking reput
Page 65 and 66: elements were usually eliminated. H
Page 67 and 68: Table 4.1. The Definitions of Descr
Page 69 and 70: 4.3.3 Overview of datasetsTable 4.2
Page 71 and 72: Table 4.4. Categorization of n Obje
Page 73 and 74: 4.3.6 DiscussionIn the field of nat
Page 75 and 76: 4.4 Description type based answer c
Page 77 and 78: R = |Rc||Ra|(4.10)Varying combinati
Page 79 and 80: Chapter 5Extraction of ProceduralEx
Page 81 and 82: Table 5.2. Domain and Type of List.
Page 83 and 84: Figure 5.2. Collection of Lists fro
Page 85: Table 5.3. Types of Tags.Tag types
Page 89 and 90: Table 5.6. Result of Close-Domain.C
Page 91 and 92: Table 5.9. Comparison of SVM and De
Page 93: Sentence : “ [menyu] w o s ent a
Page 96 and 97: • It is applicable in case that m
Page 98 and 99: • For extraction of procedural ex
Page 100 and 101: [10] Regina Barzilay. Information F
Page 102 and 103: [29] Yoav Freund and Robert E. Scha
Page 104 and 105: [47] Chiori Hori, Takaaki Hori, Hid
Page 106 and 107: [63] Mingzhe Jin. Authorship attrib
Page 108 and 109: [84] Christopher D. Manning and Hin
Page 110 and 111: of International World Wide Web Con
Page 112 and 113: [122] Akihiro Shinmori, Manabu Okum
Page 114 and 115: [141] Akihiro Tamura, Hiroya Takamu
Page 116 and 117: [161] Yudong Yang and HongJiang Zha
Page 118 and 119: [187] . n-gram . , Vol. 23,No. 5,
Page 120 and 121: - (Evaluation) Yes-No Yes-No (Ho
Page 122 and 123: (Analysis) : (Fact) : (Instance)
Page 125: List of PublicationJournal Papers[1
show all

file - ChaSen - 奈良先端科学技術大学院大学

Create successful ePaper yourself

Delete template?

Save as template?