Download - Academy Publisher

More documents

Recommendations

Info

Accounts, URL fetch, image manipulation, and email services [8]. 2) Google Apps Google Apps is one of the most sophisticated and comprehensive collaborative products available. The program includes applications for email, calendars, instant messaging, room reservations, document storage and editing and video sharing. 3) Google File system (GFS) GFS provides a reliable distributed storage system that can grow to petabyte scale by keeping data in 64- megabyte “chunks” stored on disks spread across thousands of machines. Each chunk is replicated, usually 3 times, on different machines so GFS can recover seamlessly from disk or machine failure. Figure 3 shows its Architecture. Figure 3. Google File System Architecture. 4) Work queue and MapReduce Work queue provides a handling mechanism for scheduling a job to run on a cluster of machines. It schedules jobs, allocates resources, reports status, and collects the results. Its execution flow is demonstrated in Figure 4. Figure 4. MapReduce Execution overview. D. Eucalyptus Eucalyptus is an open source software infrastructure for creating on-premise clouds on existing Enterprise. It is a service provider infrastructure and an elastic utility computing Architecture [14]. Eucalyptus supports the cloud interface of the popular Amazon Web Services, can work with kinds of hypervisors and virtualization technologies and allows on-premise clouds to interact with public clouds with the help of a common programming interface. Enterprise Eucalyptus can provide customized service level agreements (SLAs), metering, cloud monitoring, and supports for a highly scalable and available cloud platform. Eucalyptus was designed modularly and its components were a set of web services interoperating using standard communication protocols. A description of the components within the Eucalyptus is briefly as follows. 1) Cloud Controller (CLC) 2) Cluster Controller (CC) 3) Node Controller (NC) 4) Storage Controller (SC) Ⅴ. CONCLUSION AND TRENDS In this paper, we have proposed Cloud computing paradigm from a variety of aspects, such as definitions, features, and technologies. Moreover, we have illustrated several representative platforms for the state-of-the-art Cloud computing. Cloud computing needs to be extended to support negotiation of QoS based on Service Level Agreement (SLAs). The corresponding algorithms should be designed for allocation of VM resources to meet SLAs between service providers and end-users. The risks of the violation of SLAs must be managed effectively. Furthermore, we must extend some protocols to support interaction between different Cloud service providers. REFERENCES [1] Windows Azure Platform, v1.3—Chappell. http://www.microsoft.com/windowsazure/. [2] Amazon Elastic Compute Cloud[URL]. http://aws.amazon.com/ec2/ [3] IBM Blue Cloud project. http://www- 03.ibm.com/press/us/en/pressrelease/22613.wss/. [4] Global Cloud computing test bed. http://www.hp.com/hpinfo/newsroom/press/2008/080729x a.html/. [5] http://en.wikipedia.org/wiki/Cloud_computing#cite_note-0 [6] Lamia Youseff, Maria Butrico, Dilma Da Silva. Toward a Unified Ontology of Cloud Computing. GCE'08, 2008. [7] Salesforce Customer Relationships Management (CRM) system, http://www.salesforce.com/. [8] GOOGLE Apps, http://www.google.com/apps/business/index. [9] GOOGLE App Engine, http://code.google.com/appengine. [10] Apex: Salesforce on-demand programming language and framework, http://developer.force.com/. [11] Hadoop, http://hadoop.apache.org/. [12] C. Olston, B. Reed, et al., "Pig latin: a not-so-foreign language for data processing," Proceedings of the 2008 ACM SIGMOD international conference on Management of data. New York, NY, USA: ACM, 2008, pp. 1099-1110. [13] Enomalism elastic computing infrastructure, http://www.enomaly.com. [14] Eucalyptus systems, http://eucalyptus.cs.ucsb.edu/. [15] S. Ghemawat, H. Gobioff, and S.-T. Leung, "The google file system," SIGOPS Oper. Syst. Rev., vol. 37, no. 5, pp. 29-43, 2003. [16] "Amazon simple storage service," http://aws.amazon.com/s3/. [17] Yike Guo. Introduction to Cloud Computing. TR. 11, 2009. 164
ISBN 978-952-5726-09-1 (Print) Proceedings of the Second International Symposium on Networking and Network Security (ISNNS ’10) Jinggangshan, P. R. China, 2-4, April. 2010, pp. 165-169 Weighted Suffix Tree Document Model for Web Documents Clustering Ruilong Yang, Qingsheng Zhu, and Yunni Xia College of Computer Science, Chongqing University, Chongqing, China Email: ruilong.yangrl@gmail.com, qszhu@cqu.edu.cn, xiayunni@yahoo.com.cn Abstract—A novel Weighted Suffix Tree Document (WSTD) model is proposed to construct Web document feature vector for computing the pairwise similarities of documents with weighted phrase. The weighted phrase-based document similarity is applied to the Group-average Hierarchical Agglomerative Clustering (GHAC) algorithm to develop a new Web document clustering approach. First, different document parts are assigned different levels of significance as structure weights. Second, the WSTD is built with sentences and their structure weights. Third, each internal node and its weights in WSTD model are mapped into a unique feature term in the Vector Space Document (VSD) model; the new weighted phrase-based document similarity extends the term TF-IDF weighting scheme in computing the document similarity with weighted phrases. Finally, the GHAC algorithm is employed to generate final clusters. The improvements of evaluation experiments indicate that the new clustering approach is more effective on clustering the documents than the approach which ignores the Web documents structures. In conclusion, the WSTD model much contributes to improving effectiveness of Web documents clustering. Index Terms—suffix tree, web document clustering, weight computing, phrase-based similarity, document structure I. INTRODUCTION In an effort to keep up with the exponential growth of the WWW, many researches are targeted on how to organize such information in a way that will make it easier for the end users to find the information efficiently and accurately. One of these is document clustering [1-4] which attempt to segregate the documents into groups where each group represents a topic that is different than those topics represented by the other groups[4]. In general, the clustering techniques are based on four concepts [4, 5]: data representation model, similarity measure, clustering model, and clustering algorithm. Most of the current documents clustering methods are based on the Vector Space Document (VSD) model. The common framework of this data model starts with a representation of any document as a feature vector of the words that appear in the documents of a data set. To achieve a more accurate document clustering, phrase has been considered as a more informative feature term in recent research work and literature [3-7]. A Supported by National Key Technology R&D Program of China (No.2007BAH08B04), Chongqing Key Technology R&D Program of China (No.2008AC20084), CSTC Research Program of Chongqing of China (No.2009BB2203) and Post-doctorial Science Foundation of China (No. 20090450091). © 2010 ACADEMY PUBLISHER AP-PROC-CS-10CN006 165 important phrase-based clustering, Suffix Tree Clustering (STC), is based on the Suffix Tree Document (STD) model which was proposed by Zamir and Etzioni[6]. The STC algorithm was used in their meta-searching engine to cluster the document snippets returned from other search engine in realtime. The algorithm is based on identifying the phrases that are common to groups of documents. By studying the STD model, Chim and Deng [5] find that this model can provide a flexible n-gram method to identify and extract all overlap phrases in the documents. They combine the advantages of STD and VSD models to develop an approach for document clustering. By mapping each node of a suffix tree except the root node into a unique dimension of an M-dimensional term space (M is the total number of nodes except the root node). each document is represented by a feature vector of M nodes. With these feature vectors, the cosine similarity measure is employed to compute the pairwise similarities of documents which are applied to the Group-average Hierarchical Agglomerative Clustering (GHAC) algorithm [8]. However, when applying these clustering methods to Web documents, the characteristics of the Web documents are ignored. HTML tags are used to designate different parts of the document and identify key parts based on this structure[4]. A Weighted Suffix Tree Document (WSTD) model is proposed to develop a novel approach for clustering Web documents. This approach consists of four parts: (1) Analyze the HTML document and assign different levels of significance to different document parts as structure weights. (2) Represent different parts of the documents as some sentences with levels of significance. The sentence is represented as a sequence of words, not characters. These sentences instead of documents are used to build WSTD model. Except the root node, each node of the WSTD model contains document ID, sentence ID with its level of significance, and the traversed times of each sentence. (3) Map each node into a unique dimension of an M-dimensional term space. Each document is represented by a feature vector of M nodes. The statistical features and significance levels of all nodes are taken into account of the feature term weights and similarity measures. (4) Apply the documents similarities to GHAC algorithm to cluster the Web documents. The rest of this paper is organized as follows: Web document structure analysis is presented in Section 2; Section 3 starts with the definition of the weighted suffix
Page 1 and 2:
Proceedings The Second Internationa
Page 3 and 4:
Table of Contents Message from the
Page 5 and 6:
Zuming Xiao, Zhan Guo, Bin Tan, and
Page 7 and 8:
Message from the Symposium Chairs T
Page 9 and 10:
Second International Symposium on N
Page 11 and 12:
ISBN 978-952-5726-09-1 (Print) Proc
Page 13 and 14:
model that can deal with time serie
Page 15 and 16:
ISBN 978-952-5726-09-1 (Print) Proc
Page 17 and 18:
training for the Wushu competition
Page 19 and 20:
ISBN 978-952-5726-09-1 (Print) Proc
Page 21 and 22:
information, called weak uncertain
Page 23 and 24:
Student side Student side Student s
Page 25 and 26:
ISBN 978-952-5726-09-1 (Print) Proc
Page 27 and 28:
If we define element of student as
Page 29 and 30:
ISBN 978-952-5726-09-1 (Print) Proc
Page 31 and 32:
QoS, each frame data is divided int
Page 33 and 34:
ISBN 978-952-5726-09-1 (Print) Proc
Page 35 and 36:
for Imaging Two and Three Phase Flo
Page 37 and 38:
ISBN 978-952-5726-09-1 (Print) Proc
Page 39 and 40:
A. Profiling&following control algo
Page 41 and 42:
Research and Realization about Conv
Page 43 and 44:
PDF document structure is a tree st
Page 45 and 46:
ISBN 978-952-5726-09-1 (Print) Proc
Page 47 and 48:
support 10 Mb / s. But ENC28J60 onl
Page 49 and 50:
ISBN 978-952-5726-09-1 (Print) Proc
Page 51 and 52:
condition the first byte output of
Page 53 and 54:
ISBN 978-952-5726-09-1 (Print) Proc
Page 55 and 56:
According to maximum membership deg
Page 57 and 58:
ISBN 978-952-5726-09-1 (Print) Proc
Page 59 and 60:
ubber according to the mass ratio o
Page 61 and 62:
Ⅲ. AN IMPROVED DNA ALGORITHM FOR
Page 63 and 64:
Ⅴ.CONCLUSION REMARKS DNA computer
Page 65 and 66:
its first child q 1 on the left, th
Page 67 and 68:
chains can greatly improve efficien
Page 69 and 70:
II. RELATED WORK A. Mobile Service
Page 71 and 72:
special services. SOAP is used to b
Page 73 and 74:
detection methods of DDoS attacks m
Page 75 and 76:
Step4 calculate the new subordinate
Page 77 and 78:
Figure 1. An analysis of the partit
Page 79 and 80:
ISBN 978-952-5726-09-1 (Print) Proc
Page 81 and 82:
The preceding three formulas can be
Page 83 and 84:
ISBN 978-952-5726-09-1 (Print) Proc
Page 85 and 86:
And the decay speed of buffer seque
Page 87 and 88:
ISBN 978-952-5726-09-1 (Print) Proc
Page 89 and 90:
distance of view point. Given that
Page 91 and 92:
ISBN 978-952-5726-09-1 (Print) Proc
Page 93 and 94:
Ⅳ. EVALUATION OF BLENDED LEARNING
Page 95 and 96:
ISBN 978-952-5726-09-1 (Print) Proc
Page 97 and 98:
symmetric with respect to the origi
Page 99 and 100:
ISBN 978-952-5726-09-1 (Print) Proc
Page 101 and 102:
each sample belongs to each categor
Page 103 and 104:
ISBN 978-952-5726-09-1 (Print) Proc
Page 105 and 106:
A. Data Preparing To generate our t
Page 107 and 108:
ISBN 978-952-5726-09-1 (Print) Proc
Page 109 and 110:
oth α and β . determination of Qu
Page 111 and 112:
ISBN 978-952-5726-09-1 (Print) Proc
Page 113 and 114:
Strong earthquake 0.1< M L
Page 115 and 116:
ISBN 978-952-5726-09-1 (Print) Proc
Page 117 and 118:
indirect causes, and the logical re
Page 119 and 120:
egression. Granger causality test i
Page 121 and 122:
B. Evaluation We have evaluated thi
Page 123 and 124: used as a source and neighboring ce
Page 125 and 126: Figure 3. Drainage networks generat
Page 127 and 128: Combinational logic unit failures i
Page 129 and 130: Tabal.1 Combinational fault logic t
Page 131 and 132: under the endorsement of both the m
Page 133 and 134: TABLE I. DESCRIPTION OF PROPOSITION
Page 135 and 136: Figure 2. The process of the invers
Page 137 and 138: Figure 9. (a)the original image.(b)
Page 139 and 140: ISBN 978-952-5726-09-1 (Print) Proc
Page 141 and 142: In order to reduce to the number of
Page 145 and 146: that studying being going to be to
Page 147 and 148: Reference[10] analyzed the evolutio
Page 149 and 150: module, communication module, apper
Page 151 and 152: After the comprehensive performance
Page 153 and 154: system testing can be seen that the
Page 155 and 156: III. AUTONOMIC RESOURCE ALLOCATION
Page 157 and 158: The QoE i (T i ) is the ith user’
Page 161 and 162: nodes will be formed one cluster, t
Page 165 and 166: Where n is the number of data point
Page 169 and 170: Keyboard event Application User mod
Page 173: SQL Azure will eventually include a
Page 177 and 178: The nodes in the suffix tree are dr
Page 179 and 180: [3] Y. Li, S. M. Chung, and J. D. H
Page 181 and 182: some drilling fluid produces hydrog
Page 183 and 184: "normalization", whose membership b
Page 185 and 186: and minimum structural elements in
Page 187 and 188: esults of the spatial transform par
Page 189 and 190: B. Research on protocol actions Com
Page 191 and 192: ACKNOWLEDGMENT This work is funded
Page 193 and 194: A. Problem Description In a small b
Page 195 and 196: [4] Huang, Hung, and J. Y. jen Hsu.
Page 197 and 198: Further by calculating, the followi
Page 199 and 200: TABLE II. THE CONCENTRATION BETWEEN
Page 201 and 202: private key that obtained by using
Page 205 and 206: ontology, and the domain dictionary
Page 209 and 210: Eq.4 is NP-hard and can be solved b
Page 213 and 214: Where: G i is the selection field o
Page 215 and 216: If there is X j in a generation, wh
Page 217 and 218: Theorem 2.4([10]). Let L 1 and L 2
Page 219 and 220: The relation between the lattice im
Page 221 and 222: Figure 2. Example of single-step de
Page 223 and 224: evocation and the fourth group stor
Page 225 and 226:
focused mainly on rule-based forms
Page 227 and 228:
Ⅵ. CONCLUSION Figure 6. The Flask
Page 229 and 230:
EDCF in comparison with DCF, has so
Page 231 and 232:
B. Simulation results and analysis
Page 233 and 234:
ISBN 978-952-5726-09-1 (Print) Proc
Page 235 and 236:
in routing table. When search resou
Page 237 and 238:
ISBN 978-952-5726-09-1 (Print) Proc
Page 239 and 240:
others through the selective emotio
Page 241 and 242:
such as weather information, techno
Page 243 and 244:
the opening window, a related page
Page 245 and 246:
1) Process context storage areas. I
Page 247 and 248:
V. CONCLUSION In this thesis, sCPU-
Page 249 and 250:
main types of horizontal search env
Page 251 and 252:
Enterprise Portal security provides
Page 253 and 254:
() t = [ S () t S () t ] T N S ,...
Page 255 and 256:
[2] Kumaravel, N., and Kavitha, V.,
Page 257 and 258:
output, so BP network has been wide
Page 259 and 260:
input to the artificial neural netw
Page 261 and 262:
(2) In a process (tokens from outsi
Page 263 and 264:
The reduction process consists of t
Page 265 and 266:
all eight normal vectors are identi
Page 267 and 268:
is B object , and the number of emp
Page 269 and 270:
xi, j y' = εα ( (1 + tanh( )) −
Page 271 and 272:
Error 8000 7000 6000 5000 4000 3000
Page 273 and 274:
Architecture (AMBA) a new bus archi
Page 275 and 276:
In order to ensure the smooth proce
Page 277 and 278:
ISBN 978-952-5726-09-1 (Print) Proc
Page 279 and 280:
In this paper, we adopt two Sobel o
Page 281 and 282:
ISBN 978-952-5726-09-1 (Print) Proc
Page 283 and 284:
four layers.The far right of the gr
Page 285 and 286:
ISBN 978-952-5726-09-1 (Print) Proc
Page 287 and 288:
TABLE II. OPERATIONAL EMPLOYEE TABL
Page 289 and 290:
A. Object-relational Type To audit
Page 291 and 292:
to execution time. Also, DBA would
Page 293 and 294:
Liping Chen .......................
show all

Download - Academy Publisher

Create successful ePaper yourself

Delete template?

Save as template?