Download - Academy Publisher

More documents

Recommendations

Info

C. Comparison of STD and WSTD Model In experiments, GHAC algorithm is chosen for some reasons in [5], which measures the similarity of two clusters with the average of pairwise similarities of the documents from each cluster. The effectiveness of two document models is evaluated with GHAC algorithm in the comparison experiments: STD model and WSTD model. In the comparison experiments, two subsets of each original data set are selected to be six small data sets: DS1-1, DS1-2, DS2-1, DS2-2, DS3-1 and DS3-2, as illustrated in Table 2, where #Cates denotes the number of categories, #Docs denotes the number of documents. For WSTD model, coefficient w H is 2 times w L , w M is 1.5 times w L . The stop criterion of GHAC is adjusted so that the number of clusters is equal to the number of the original class set of each data set. Table 3 and Table 4, respectively, illustrates the F- measure and Entropy scores computed from the clustering results of GHAC algorithm with STD and WSTD model. Table 3 illustrates the improvement on the F-measure score, and Table 4 illustrates the improvement on the Entropy score. The improvements on the F-measure and Entropy scores indicate that the new weighted phrase-based document clustering approach based on WSTD model tends to be a highly accurate documents clustering approach. D. The Performance Evaluation The time cost and the nodes mapped into the vector space are shown in Table 2, where #Nodes denotes the TABLE II. TABLE III. IMPROVEMENT ON F-MEASURE Data Sets STD WSTD Improvement DS1-1 0.899 0.96 6.7% DS1-2 0.957 0.977 2.1% DS2-1 0.542 0.702 29.4% DS2-2 0.569 0.614 7.8% DS3-1 0.817 0.909 11.3% DS3-2 0.821 0.915 11.5% TABLE IV. DATA SETS NODES AND TIME Data #GHAC #Cates #Docs #Nodes #All (ms) Sets (ms) DS1-1 5 100 11162 18344 16219 DS1-2 5 142 20969 64671 60515 DS2-1 5 100 6422 10313 9172 DS2-2 5 200 13894 115656 112563 DS3-1 5 100 439 985 844 DS3-2 5 220 1109 11641 11454 IMPROVEMENT ON ENTROPY Data Sets STD WSTD Improvement DS1-1 0.112 0.065 42.2% DS1-2 0.064 0.040 37.4% DS2-1 0.419 0.318 24.2% DS2-2 0.368 0.358 2.6% DS3-1 0.254 0.178 29.8% DS3-2 0.185 0.157 15.2% number of nodes mapped into vector space, #All denotes the total time of the approach executed, and #GHAC denotes the time of GHAC algorithm except the time of building WSTD model. The unit of time is millisecond (ms). Comparing #All and #GHAC, it is concluded that the main time cost is caused by GHAC algorithm, and constructing the WSTD model needs a little time. E. The Variable Length of Phrases In WSTD model, a document is partitioned into some sentences. The length of a sentence is assumed to be bounded by a constant. The Phrases extracted from WSTD model are also bounded a constant. As described in [4, 6], the length of phrases less than 6 is appropriate and efficient. The total number of six lengths of phrases (from 1 to 6) accounts for about 99 percent of the amount of all nodes mapped into VSD model on DS2-1 and DS3-1. The distributions of six lengths of phrases (from 1 to 6) indicate that the most efficient phrases can be extracted from the WSTD model. V. CONCLUSIONS By mapping all internal nodes in the STD model into VSD model, phrase-based similarity successfully connects the two models. For Web documents, different parts are assigned a level of significance. A document is partitioned into some sentences with significance; a sentence is represented as a sequence of words. These sentences are used to build the weighted suffix tree instead of documents. The STD model is extended to the WSTD model with the weighted nodes. Thus, the weighted phrase-based document similarity is proposed. The significant improvement of the clustering quality in experiments clearly indicates that the new clustering approach with the weighted phrasebased document similarity is more effective on clustering the Web documents. Using sentences to build WSTD model, most efficient phrases can be extracted from the WSTD model which is used as an n-gram technique to identify and extract weighted phrases in Web documents. ACKNOWLEDGMENT This work was partly supported by the National Key Technology R&D Program of China (No.2007BAH08B04), the Chongqing Key Technology R&D Program of China (No.2008AC20084), CSTC Research Program of Chongqing of China (No.2009BB2203) and Post-doctorial Science Foundation of China (No. 20090450091). REFERENCES [1] N. Oikonomakou, and M. Vazirgiannis, "A Review of Web Document Clustering Approaches," Data Mining and Knowledge Discovery Handbook, pp. 921-943: Springer US, 2005. [2] L. Yanjun, “Text Clustering with Feature Selection by Using Statistical Data,” IEEE Transactions on Knowledge and Data Engineering, vol. 20, pp. 641-652, 2007. 168
[3] Y. Li, S. M. Chung, and J. D. Holt, “Text Document Clustering Based on Frequent Word Meaning Sequences,” Data & Knowledge Engineering, vol. 64, no. 1, pp. 381- 404, 2008. [4] K. M. Hammouda, and M. S. Kamel, “Efficient Phrase- Based Document Indexing for Web Document Clustering,” IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 10, pp. 1279-1296, 2004. [5] H. Chim, and X. Deng, “Efficient Phrase-Based Document Similarity for Clustering,” IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 9, pp. 1217- 1229, 2008. [6] O. Zamir, and O. Etzioni, "Web Document Clustering: A Feasibility Demonstration," Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 46-54, 1998. [7] S. Zu Eissen, B. Stein, and M. Potthast, “The Suffix Tree Document Model Revisited,” in Proceedings of the 5th International Conference on Knowledge Management (I- KNOW 05), Graz, Austria, 2005, pp. 596-603. [8] C. Manning, P. Raghavan, and H. Schütze, "An introduction to information retrieval," p. 377~400, Cambridge,England: Cambridge University Press, 2009. [9] E. Ukkonen, “On-Line Construction of Suffix Trees,” Algorithmica, vol. 14, no. 3, pp. 249-260, 1995. [10] C. Carpineto, and G. Romano. "Ambient Dataset," 2008; http://credo.fub.it/ambient/. [11] S. Osiński, and D. Weiss, “Carrot 2: Design of a Flexible and Efficient Web Information Retrieval Framework,” Advances in Web Intelligence, vol. 3528, pp. 439-444, 2005. 169
Page 1 and 2:
Proceedings The Second Internationa
Page 3 and 4:
Table of Contents Message from the
Page 5 and 6:
Zuming Xiao, Zhan Guo, Bin Tan, and
Page 7 and 8:
Message from the Symposium Chairs T
Page 9 and 10:
Second International Symposium on N
Page 11 and 12:
ISBN 978-952-5726-09-1 (Print) Proc
Page 13 and 14:
model that can deal with time serie
Page 15 and 16:
ISBN 978-952-5726-09-1 (Print) Proc
Page 17 and 18:
training for the Wushu competition
Page 19 and 20:
ISBN 978-952-5726-09-1 (Print) Proc
Page 21 and 22:
information, called weak uncertain
Page 23 and 24:
Student side Student side Student s
Page 25 and 26:
ISBN 978-952-5726-09-1 (Print) Proc
Page 27 and 28:
If we define element of student as
Page 29 and 30:
ISBN 978-952-5726-09-1 (Print) Proc
Page 31 and 32:
QoS, each frame data is divided int
Page 33 and 34:
ISBN 978-952-5726-09-1 (Print) Proc
Page 35 and 36:
for Imaging Two and Three Phase Flo
Page 37 and 38:
ISBN 978-952-5726-09-1 (Print) Proc
Page 39 and 40:
A. Profiling&following control algo
Page 41 and 42:
Research and Realization about Conv
Page 43 and 44:
PDF document structure is a tree st
Page 45 and 46:
ISBN 978-952-5726-09-1 (Print) Proc
Page 47 and 48:
support 10 Mb / s. But ENC28J60 onl
Page 49 and 50:
ISBN 978-952-5726-09-1 (Print) Proc
Page 51 and 52:
condition the first byte output of
Page 53 and 54:
ISBN 978-952-5726-09-1 (Print) Proc
Page 55 and 56:
According to maximum membership deg
Page 57 and 58:
ISBN 978-952-5726-09-1 (Print) Proc
Page 59 and 60:
ubber according to the mass ratio o
Page 61 and 62:
Ⅲ. AN IMPROVED DNA ALGORITHM FOR
Page 63 and 64:
Ⅴ.CONCLUSION REMARKS DNA computer
Page 65 and 66:
its first child q 1 on the left, th
Page 67 and 68:
chains can greatly improve efficien
Page 69 and 70:
II. RELATED WORK A. Mobile Service
Page 71 and 72:
special services. SOAP is used to b
Page 73 and 74:
detection methods of DDoS attacks m
Page 75 and 76:
Step4 calculate the new subordinate
Page 77 and 78:
Figure 1. An analysis of the partit
Page 79 and 80:
ISBN 978-952-5726-09-1 (Print) Proc
Page 81 and 82:
The preceding three formulas can be
Page 83 and 84:
ISBN 978-952-5726-09-1 (Print) Proc
Page 85 and 86:
And the decay speed of buffer seque
Page 87 and 88:
ISBN 978-952-5726-09-1 (Print) Proc
Page 89 and 90:
distance of view point. Given that
Page 91 and 92:
ISBN 978-952-5726-09-1 (Print) Proc
Page 93 and 94:
Ⅳ. EVALUATION OF BLENDED LEARNING
Page 95 and 96:
ISBN 978-952-5726-09-1 (Print) Proc
Page 97 and 98:
symmetric with respect to the origi
Page 99 and 100:
ISBN 978-952-5726-09-1 (Print) Proc
Page 101 and 102:
each sample belongs to each categor
Page 103 and 104:
ISBN 978-952-5726-09-1 (Print) Proc
Page 105 and 106:
A. Data Preparing To generate our t
Page 107 and 108:
ISBN 978-952-5726-09-1 (Print) Proc
Page 109 and 110:
oth α and β . determination of Qu
Page 111 and 112:
ISBN 978-952-5726-09-1 (Print) Proc
Page 113 and 114:
Strong earthquake 0.1< M L
Page 115 and 116:
ISBN 978-952-5726-09-1 (Print) Proc
Page 117 and 118:
indirect causes, and the logical re
Page 119 and 120:
egression. Granger causality test i
Page 121 and 122:
B. Evaluation We have evaluated thi
Page 123 and 124:
used as a source and neighboring ce
Page 125 and 126:
Figure 3. Drainage networks generat
Page 127 and 128: Combinational logic unit failures i
Page 129 and 130: Tabal.1 Combinational fault logic t
Page 131 and 132: under the endorsement of both the m
Page 133 and 134: TABLE I. DESCRIPTION OF PROPOSITION
Page 135 and 136: Figure 2. The process of the invers
Page 137 and 138: Figure 9. (a)the original image.(b)
Page 139 and 140: ISBN 978-952-5726-09-1 (Print) Proc
Page 141 and 142: In order to reduce to the number of
Page 145 and 146: that studying being going to be to
Page 147 and 148: Reference[10] analyzed the evolutio
Page 149 and 150: module, communication module, apper
Page 151 and 152: After the comprehensive performance
Page 153 and 154: system testing can be seen that the
Page 155 and 156: III. AUTONOMIC RESOURCE ALLOCATION
Page 157 and 158: The QoE i (T i ) is the ith user’
Page 161 and 162: nodes will be formed one cluster, t
Page 165 and 166: Where n is the number of data point
Page 169 and 170: Keyboard event Application User mod
Page 173 and 174: SQL Azure will eventually include a
Page 177: The nodes in the suffix tree are dr
Page 181 and 182: some drilling fluid produces hydrog
Page 183 and 184: "normalization", whose membership b
Page 185 and 186: and minimum structural elements in
Page 187 and 188: esults of the spatial transform par
Page 189 and 190: B. Research on protocol actions Com
Page 191 and 192: ACKNOWLEDGMENT This work is funded
Page 193 and 194: A. Problem Description In a small b
Page 195 and 196: [4] Huang, Hung, and J. Y. jen Hsu.
Page 197 and 198: Further by calculating, the followi
Page 199 and 200: TABLE II. THE CONCENTRATION BETWEEN
Page 201 and 202: private key that obtained by using
Page 205 and 206: ontology, and the domain dictionary
Page 209 and 210: Eq.4 is NP-hard and can be solved b
Page 213 and 214: Where: G i is the selection field o
Page 215 and 216: If there is X j in a generation, wh
Page 217 and 218: Theorem 2.4([10]). Let L 1 and L 2
Page 219 and 220: The relation between the lattice im
Page 221 and 222: Figure 2. Example of single-step de
Page 223 and 224: evocation and the fourth group stor
Page 225 and 226: focused mainly on rule-based forms
Page 227 and 228: Ⅵ. CONCLUSION Figure 6. The Flask
Page 229 and 230:
EDCF in comparison with DCF, has so
Page 231 and 232:
B. Simulation results and analysis
Page 233 and 234:
ISBN 978-952-5726-09-1 (Print) Proc
Page 235 and 236:
in routing table. When search resou
Page 237 and 238:
ISBN 978-952-5726-09-1 (Print) Proc
Page 239 and 240:
others through the selective emotio
Page 241 and 242:
such as weather information, techno
Page 243 and 244:
the opening window, a related page
Page 245 and 246:
1) Process context storage areas. I
Page 247 and 248:
V. CONCLUSION In this thesis, sCPU-
Page 249 and 250:
main types of horizontal search env
Page 251 and 252:
Enterprise Portal security provides
Page 253 and 254:
() t = [ S () t S () t ] T N S ,...
Page 255 and 256:
[2] Kumaravel, N., and Kavitha, V.,
Page 257 and 258:
output, so BP network has been wide
Page 259 and 260:
input to the artificial neural netw
Page 261 and 262:
(2) In a process (tokens from outsi
Page 263 and 264:
The reduction process consists of t
Page 265 and 266:
all eight normal vectors are identi
Page 267 and 268:
is B object , and the number of emp
Page 269 and 270:
xi, j y' = εα ( (1 + tanh( )) −
Page 271 and 272:
Error 8000 7000 6000 5000 4000 3000
Page 273 and 274:
Architecture (AMBA) a new bus archi
Page 275 and 276:
In order to ensure the smooth proce
Page 277 and 278:
ISBN 978-952-5726-09-1 (Print) Proc
Page 279 and 280:
In this paper, we adopt two Sobel o
Page 281 and 282:
ISBN 978-952-5726-09-1 (Print) Proc
Page 283 and 284:
four layers.The far right of the gr
Page 285 and 286:
ISBN 978-952-5726-09-1 (Print) Proc
Page 287 and 288:
TABLE II. OPERATIONAL EMPLOYEE TABL
Page 289 and 290:
A. Object-relational Type To audit
Page 291 and 292:
to execution time. Also, DBA would
Page 293 and 294:
Liping Chen .......................
show all

Download - Academy Publisher

Create successful ePaper yourself

Delete template?

Save as template?