Back Room Front Room 2

More documents

Recommendations

Info

the step of concepts extraction all documents are indexed with their concepts). 6.2.2 Hierarchies Similarities In this section we compare the hierarchies induced by our method and term hierarchies to the original hierarchy using the measures of section 5. Table 2: similarities between Looksmart and NewScientist data and others hierarchies (term, concept, concept_version2) LookSmart Terms. Concepts1. Concept2. Inclusion 0.4 0.46 0.65 Mutual Information NewScientist 0.3 0.6 0.7 Terms. Concepts1. Concept2. Inclusion 0.3 0.2 0.6 Mutual Information INFORMATION ACCESS VIA TOPIC HIERARCHIES AND THEMATIC ANNOTATIONS 0.2 0.2 0.65 The concept hierarchy is large compared to the originals ones, and only a few documents are assigned to each concept. The greater width of the concepts hierarchy is due to the fact that some themes detected through corpus segmentation are not present in originals hierarchies which exploit poorer conceptual representations. Nevertheless, the similarity between our hierarchy and LookSmart’s is quite high. The inclusion similarity is about 0.5, and the similarity based on the mutual information is around 0.6 (table 3). But the similarity between our hierarchy and New-Scientist one is low. This result point out the weakness of subsumption method our algorithm is based on, when the data are heterogeneous. We decide to modify our algorithm to be free from terms hierarchy induction for computing the subsumption relation between concepts. Remember that in our definition, concept C1 subsumes concept C2 if most terms of C2 are subsumed by terms from C1. This relation was inferred from the term hierarchy (section 3.2.2). However it is also possible to directly derive the concept hierarchy without relying on the term hierarchy. For that we directly estimate P(concept Ci | concept Cj) by the number of documents containing both concepts divided by the number of documents containing concept Cj . This hierarchy (denoted Concepts2. in table 2) seems closer to the manual hierarchy. It detects less subsumption relations between concepts on looksmart data; therefore it is less wide than the first concept hierarchy. Why does the number of subsumption between concepts fall down in the second method? A reason might be that in the construction of the first concept hierarchy, concept C2 is generalized by concept C1 if most of the terms of C2 are generalized by C1 terms. Let us take the extreme case where only one word w1 of C1 generalizes the C2 terms. In this case, we will say that C1 generalizes C2. Actually, we can say that the presence of C2 in a document implies the presence of w1, but it is not sure that it implies the presence of C1. For the second concept hierarchy the subsumption of C2 by C1 ensures that C2 implies C1. For the newscientist data, due to the heterogeneity of the vocabulary, subsomption test fail for many pairs of word and this effect is more drastic when projecting theme on term hierarchy. The consequence is that many theme nodes is compose by one document. Therefore the hierarchy is far from the original one. Modifying the definition of subsomption concept gives a hierarchy more similar than the original one. One way to reduce the influence of vocabulary heterogeneity is to consider synonyms in the computation of P(term1|term2). These experiments shed some light on the algorithm behaviour. The hierarchies we obtain are coherent (particularly the second those obtain with the second method) compared to LookSmart and New- Scientists hierarchies, particularly on the groups of documents detected, but some of the documents pairs sharing the relation “Parent-Child” in the concept hierarchy do not appear in Looksmart hierarchy. This is inherent to the difference of nature between the two hierarchies. If we compare the automatically built term hierarchy with that of LookSmart, we see that inclusion similarity is 0.4 and the mutual information is 0.3. Both hierarchies use terms to index and organize documents. However, the term hierarchy uses all terms in the collection, whereas LookSmart uses a much smaller vocabulary. Therefore the hierarchy term is very large compared to LookSmart. Nevertheless some groups of documents are still common to the two hierarchies. The similarity of the concept hierarchy with Looksmart seems higher than that of the term hierarchy. 7 CONCLUSIONS AND PERSPECTIVES 149 We have described a method to automatically generate a hierarchical structure from a documents collection. The same method can be used to build specialization/generalization links between documents
150 ENTERPRISE INFORMATION SYSTEMS VI or document parts and augmented documents with metadata in natural language. We have also introduced two new numerical measures for the open problem of the comparison and evaluation of such hierarchies. These measures give an indication on the proximity of two hierarchies; this allows measuring the coherence of two different hierarchies. On the other hand, they do not say anything on the intrinsic quality of the hierarchies. We are currently working on the development of measures for quantifying how much a hierarchy respects the “specialization/generalization” property. Our method applied to LookSmart and New- Scientists data gives interesting first results although there is still place for improvements. The experiments also show that our concepts hierarchies are nearer to original hierarchies than a reference method which automatically builds terms hierarchies. Further experiments on different collections and on a larger scale are of course needed to confirm this fact. We also show that our algorithm could give some answers to Semantic Web research concerns: �� Thematic hierarchies make easier information access and navigation, they are also a mean to synthesize a collection �� The algorithm allow to automatically related document sharing a specialization/generalization relation. �� At the end of the method each document is augmented with a set of keywords which reflects the concepts it is about A perspective could be the use of automatically extracted concepts to build or enrich ontologies in a specific domain. REFERENCES J. Allan, 1996. Automatic hypertext link typing. Proceeding of the ACM Hypertext Conference, Washington, DC pp.42-52. C. Cleary, R. Bareiss, 1996. Practical methods for automatically generating typed links. Hypertext ’96, Washington DC USA D. R. Cutting, D. R. Karger, J. O. Pedersen, J. W. Tukey, 1992. Scatter/gather: A cluster-based approach to browsing large document collections. In ACM SIGIR. T. Draier, P. Gallinari, 2001. Characterizing Sequences of User Actions for Access Logs Analysis, User Modelling, LNAI 2109. G. Källgren, 1988. Automatic Abstracting on Content in text. Nordic Journal of Linguistics. pp. 89-110, vol. 11. B. Katz, J. Lin, 2002. Annotating the Semantic Web Using Natural Language. In Proceedings of the 2 nd Workshop on NLP and XML (NLPXML-2002) at COLING 2002. B. Katz, J. Lin, D. Quan, 2002. Natural Language Annotations for the Semantic Web. In Proceedings of the International Conference on Ontologies, Databases and Applications of Semantics (ODBASE2002). K. Krishna, R. Krishnapuram, 2001. A Clustering Algorithm for Asymmetrically Related Data with Applications to Text Mining. Proceedings of the 2001 ACM CIKM International Conference on Information and Knowledge Management. Atlanta, Georgia, USA. Pp.571-573 Dawn Lawrie and W. Bruce Croft, 2000. Discovering and Comparing Topic Hierarchies. In proceedings of RIAO 2000. G. Salton, A. Singhal, C. Buckley, M. Mitra, 1996. Automatic Text Decomposition Using Text Segments and Text Themes. Hypertext 1996: 53-65 Mark Sanderson, Bruce Croft, 1999. Deriving concept hierarchies from text. In Proceedings ACM SIGIR Conference '99, 206-213. Randall Trigg, 1983. A network-based approach to text handling for the online scientific community. University of Maryland, Department of Computer Science, Ph.D dissertation, November 1983. A. Vinokourov, M. Girolami, 2000. A Probabilistic Hierarchical Clustering Method for Organizing Collections of Text Documents. Proceedings of the 15th International Conference on Pattern Recognition (ICPR’2000), Barcelona, Spain. IEEE computer press, vol.2 pp.182-185.
Page 1 and 2:
www.dbeBooks.com - An Ebook Library
Page 3 and 4:
A C.I.P. Catalogue record for this
Page 5 and 6:
vi TABLE OF CONTENTS PART 1 - DATAB
Page 7 and 8:
viii TABLE OF CONTENTS A WIRELESS A
Page 9 and 10:
CONFERENCE COMMITTEE Honorary Presi
Page 11 and 12:
Hung, P. (AUSTRALIA) Jahankhani, H.
Page 13 and 14:
Invited Papers
Page 15 and 16:
2 ENTERPRISE INFORMATION SYSTEMS VI
Page 17 and 18:
Page 19 and 20:
Page 21 and 22:
Page 23 and 24:
10 ENTERPRISE INFORMATION SYSTEMS V
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
EVOLUTIONARY PROJECT MANAGEMENT: MU
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
MANAGING COMPLEXITY OF ENTERPRISE I
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
ASSESSING EFFORT PREDICTION MODELS
Page 69 and 70:
Page 71 and 72:
Page 73 and 74:
Page 75 and 76:
ORGANIZATIONAL AND TECHNOLOGICAL CR
Page 77 and 78:
Page 79 and 80:
ASAP Processes W Project Kickoff A
Page 81 and 82:
Page 83 and 84:
since it means changing the relevan
Page 85 and 86:
ACME-DB: AN ADAPTIVE CACHING MECHAN
Page 87 and 88:
ACME-DB: AN ADAPTIVE CACHING MECHAN
Page 89 and 90:
Hit Rate (%) ACME-DB: AN ADAPTIVE C
Page 91 and 92:
5.3.6 Effect on all Policies by Int
Page 93 and 94:
ACKNOWLEDGEMENTS We would like to t
Page 95 and 96:
This paper describes the data sampl
Page 97 and 98:
The major limit A of the operation
Page 99 and 100:
RELATIONAL SAMPLING FOR DATA QUALIT
Page 101 and 102:
TOWARDS DESIGN RATIONALES OF SOFTWA
Page 103 and 104:
((Brown et al., 1998), pp. 159-190,
Page 105 and 106:
sive data filtering, presentation (
Page 107 and 108:
• implementation changes in servi
Page 109 and 110: MEMORY MANAGEMENT FOR LARGE SCALE D
Page 111 and 112: Data Rate [bytes/sec] 1600000 14000
Page 113 and 114: E.g., DV Camcorder Camera Packets (
Page 115 and 116: Procedure CheckOptimal (n rs 1 ...n
Page 117 and 118: MEMORY MANAGEMENT FOR LARGE SCALE D
Page 119 and 120: PART 2 Artificial Intelligence and
Page 121 and 122: 110 ENTERPRISE INFORMATION SYSTEMS
Page 127 and 128: DYNAMIC MULTI-AGENT BASED VARIETY F
Page 159: 148 ENTERPRISE INFORMATION SYSTEMS
Page 169 and 170: AN EXPERIENCE IN MANAGEMENT OF IMPR
Page 179 and 180: ANALYSIS AND RE-ENGINEERING OF WEB
Page 181 and 182: Module p0 Customer Itinerary Send I
Page 183 and 184: departments: the marketing dept. se
Page 185 and 186: • An acyclic module M is usable,
Page 187 and 188: BALANCING STAKEHOLDER’S PREFERENC
Page 189 and 190: The scenarios will provide an abstr
Page 191 and 192: BALANCING STAKEHOLDER'S PREFERENCES
Page 193 and 194: BALANCING STAKEHOLDER'S PREFERENCES
Page 195 and 196: A POLYMORPHIC CONTEXT FRAME TO SUPP
Page 197 and 198: A POLYMORPHIC CONTE XT FRAME TO SUP
Page 203 and 204: FEATURE MATCHING IN MODEL-BASED SOF
Page 205 and 206: combination of solution domains - a
Page 207 and 208: • features for all the concepts:
Page 209 and 210: Existence In both steps it is possi
Page 211 and 212:
matching strategies are maximal, mi
Page 213 and 214:
TOWARDS A META MODEL FOR DESCRIBING
Page 215 and 216:
elementary message (ae-message) can
Page 217 and 218:
problems on a pragmatic level, whic
Page 219 and 220:
TOWARDS A META MODEL FOR DESCRIBING
Page 221 and 222:
INTRUSION DETECTION SYSTEMS USING A
Page 223 and 224:
INTRUSION DETECTION SYSTEMS USING A
Page 225 and 226:
(k� 1) (k) (k�1) (k) p � w
Page 227 and 228:
Table 5: Performance Comparison of
Page 229 and 230:
A USER-CENTERED METHODOLOGY TO GENE
Page 231 and 232:
carries out the second phase of the
Page 233 and 234:
1 1 1 1 2 PROCESS STORE ENTITY EDGE
Page 235 and 236:
constraints rules. KOGGE (Ebert et
Page 237 and 238:
PART 4 Software Agents and Internet
Page 239 and 240:
230 ENTERPRISE INFORMATION SYSTEMS
Page 241 and 242:
Page 243 and 244:
Page 245 and 246:
Page 247 and 248:
Page 249 and 250:
Page 251 and 252:
Page 253 and 254:
Page 255 and 256:
Page 257 and 258:
Page 259 and 260:
Page 261 and 262:
Page 263 and 264:
Page 265 and 266:
Page 267 and 268:
Page 269 and 270:
Page 271 and 272:
Page 273 and 274:
Page 275 and 276:
Page 277 and 278:
Page 279 and 280:
Page 281 and 282:
Page 283 and 284:
Page 285 and 286:
CABA 2 L A BLISS PREDICTIVE COMPOSI
Page 287 and 288:
y disables evidencing severe mental
Page 289 and 290:
Figure 4: Symbols emission in DAR-H
Page 291 and 292:
they evidence a generalization erro
Page 293 and 294:
A METHODOLOGY FOR INTERFACE DESIGN
Page 295 and 296:
and mobility loss coupled with redu
Page 297 and 298:
Tradeoff: The default input is poss
Page 299 and 300:
Sessions on the VABS which are held
Page 301 and 302:
A CONTACT RECOMMENDER SYSTEM FOR A
Page 303 and 304:
A CONTACT RECOMMENDER SYSTEM FOR A
Page 305 and 306:
- "Going to the sources". The user
Page 307 and 308:
Here are the matrix W and P for our
Page 309 and 310:
EMOTION SYNTHESIS IN VIRTUAL ENVIRO
Page 311 and 312:
theme turns out to be surprisingly
Page 313 and 314:
6 FROM FEATURES TO SYMBOLS 6.1 Face
Page 315 and 316:
sions are described by different em
Page 317 and 318:
Electronic Imaging 2000 Conference
Page 319 and 320:
to the accessibility problem for th
Page 321 and 322:
Figure 3: Hierarchical View of the
Page 323 and 324:
4 CONCLUSIONS AND FUTURE WORK On th
Page 325 and 326:
PERSONALISED RESOURCE DISCOVERY SEA
Page 327 and 328:
Page 329 and 330:
profile, the user defines his curre
Page 331 and 332:
Page 333 and 334:
Abdelkafi, N. .....................
show all

Back Room Front Room 2

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?