12.07.2015 Views

file - ChaSen - 奈良先端科学技術大学院大学

file - ChaSen - 奈良先端科学技術大学院大学

file - ChaSen - 奈良先端科学技術大学院大学

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Table 5.4. Statistics of Data Sets.Proc. Non-Proc. Comp. OthersLists 721 3399 2224 1896Items 4.6 / 2.8 4.9 / 5.7 4.8 / 6.1 4.9 / 4.4Sen. 1.8 / 1.7 1.3 / 0.9 1.5 / 1.1 1.3 / 1.1Char. 40.3 / 48.6 32.6 / 42.4 35.6 / 40.1 32.6 / 48.2which features are effective, and I must use many features in categorization toinvestigate their effectiveness. The dimension of the feature space is relativelyhigh.5.6 Features : sequential patternsSequential pattern mining consists of finding all frequent subsequences, that arecalled sequential patterns, in the database of sequences of literals.Besides conventional pattern matching techniques [38], Apriori [2] and PrefixSpan[62] are examples of sequential pattern mining methods.The Apriori algorithm is one of the most widely used methods, however thereis a great deal of room for improvement in terms of computational cost. ThePrefixSpan algorithm succeed in reducing the cost of computation by performingan operation, called projection, which confines the range of the search to setsof frequent subsequences. Details of the PrefixSpan algorithm are provided inanother paper [62].5.7 Experimental settingsIn the first experiment, to determine the categorization capability of a domain, Iemployed a set of lists in the Computer domain and conducted a cross-validationprocedure. The document set was divided into five subsets of nearly equal size,and five different SVMs, the training sets of four of the subsets, and the remainingone classified for testing. In the second experiment, to determine the categorizationcapability of an open domain, I employed a set of lists from the Others72

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!