Back Room Front Room 2

More documents

Recommendations

Info

INFORMATION ACCESS VIA TOPIC HIERARCHIES AND THEMATIC ANNOTATIONS Krishnapuram 2001] propose a framework for modelling asymmetric relations between data. All these recent works associate the notion of concept to a term and rely on the construction of term hierarchies and the classification of documents within these hierarchies. Compared to that, we propose two original contributions. The first is the extension of these approaches to the construction of real concept hierarchy where concepts are identified by set of keywords and not only by a single term, all concepts being discovered from the corpus. These concepts better reflect the different themes and ideas which appear in documents, they allow for a richer description than single terms. The second contribution is the automatic construction of a hierarchical organization of documents also based on the “specialization/generalization” relation. This is described in section 3. For identifying concepts, we perform document segmentation into homogeneous themes. We used the segmentation technique of [G. Salton et al. 1996] which relies on a similarity measure between successive passages in order to identify coherent segments. In [G. Salton et al. 1996], the segmentation method proceeds by decomposing texts into segments and themes. A segment is a bloc of text about one subject and a theme is a set of such segments. In this approach, the segmentation begins at the paragraph level. Then paragraphs are compared each other via a similarity measure. For Now, Semantic Web Research focalises on the standardization, internal structuring of documents and sharing of ontologies in a variety of domains with the short-term to transform existing sources into a machine-understandable form (i.e. RDF). The researchers of the field realise that this language is not intuitive for common user and that it is difficult for them to understand formal ontologies and defined vocabularies. Therefore they preach as another mean of semantics augmentation, the annotation in natural language which is more intuitive for humans. [B. Katz, J. Lin, 2002] propose a method to augment RDF with natural language to make RDF friendlier to humans and to facilitate the Web Semantic adoption by many users. Our approach is complementary to their work. Indeed, at the end of our structuring algorithm we have derived all the concepts present in the collection and for each document the set of concepts it is about. Then we are able to annotate documents with the set of its concepts. Each concept is represented by a set of keywords in the corpus language. 3 AUTOMATIC CONSTRUCTION OF TOPICS AND DOCUMENTS HIERARCHIES 3.1 Basic Ideas 145 This work started while studying the automatic derivation of typed links “specialization / generalization” between the documents of a corpus. A link from document D1 to document D2 is of the type specialization (generalization from D2 to D1), if D2 is about specifics themes of D1. For example, D1 is about war in general and D2 is about the First World War in particular. This type of relation allows to build hierarchical organizations of the concepts present in the corpus which in turn allows for the construction of a hierarchical corpus organization. In hierarchies like Yahoo!, the concepts used to organize documents are reduced to words. This gives only basic indications on the content of a document and the corresponding hierarchies are relatively poor. For this reason, we have tried to automatically construct hierarchies where each concept will be identified by a set of words. In order to do this, we need the knowledge of all themes present in the collection and of the specialization/generalization relations that do exist among them. From now on, we will identify a concept to a set of keywords. For identifying the concepts present in a document, we use Salton segmentation method [G. Salton et al. 1996] which outputs a set of themes extracted from the corpus. Each theme is identified by a set of representative keywords. For the detection of specialization/generalization relations between detected concepts, we will first build a term hierarchy like [Mark Sanderson, Bruce Croft, 1999], we then construct from that a concept hierarchy. After that, documents may be associated to relevant concepts in this hierarchy thus producing a document hierarchy based on the “specialization/generalization” relation between documents. To summarize, the method is built around three main steps: �� Find the set of concepts of a given corpus �� Build a hierarchy (of type specialization /generalization) of these concepts �� Project the documents in the concepts hierarchy and infer typed links “specialization/generalization” between documents.
146 ENTERPRISE INFORMATION SYSTEMS VI 3.2 Algorithm 3.2.1 Concept Extraction from a Corpus The goal here is to detect the set of concepts within the corpus and the words that represent them. For that, we extend Salton work on text segmentation: We decompose a document into semantic themes using Salton’s method [G. Salton et al. 1996], which can be viewed as a clustering on document paragraph. Each document being decomposed in set of semantic themes, we then cluster all the themes in all documents to retain the minimal set of themes that ensure a correct coverage of the corpus. We find for each concept the set of words that represent the concept. A concept is be represented here by its most frequent words. 3.2.2 Building the Concepts Hierarchy The next step is to detect the “specialization/generalization” relations between extracted concepts so as to infer the concept hierarchy. First, we build the hierarchy of terms within the corpus using Croft and Sanderson subsumption method [Mark Sanderson, Bruce Croft, 1999]. Then we create a concept hierarchy as follows. For each couple of concepts, we compute from the terms hierarchy the percentage x of words of concept C2 generalized by words of concept C1 and y the percentage of words of C1 generalized by words of C2. If x > S1 > S2 > y (S1 and S2 are thresholds) then we deduce a relation of specialization/generalization between these concepts (C1 generalizes C2). 1 After that, we have a hierarchical organization of concepts. It is therefore possible to attach indexed documents to the nodes in the hierarchy. One document may belong to different nodes if it is concerned with different concepts. Note that all concepts are not comparable by this “specialization / generalization” relation. At this stage we already have an interesting organization of the corpus which rely on a richer semantic than those offered on classical portals or by term hierarchies [Lawrie and Croft 2000, Sanderson and Croft 1999]. However we can go further and establish “specialization/generalisation” links between corpus documents as explained below. 1 Note that we briefly describe in section 6 an alternative method for directly building concept hierarchies without the need to first build the term hierarchy. 3.2.3 “Specialisation/Generalization” Relation Between Documents Each document may be indexed by the set of corpus concepts and annotated by the set of keywords of the relevant concepts of its content. We then proceed in a similar way as for building concepts hierarchies from terms hierarchies: For each couple of documents D1, D2, we compute from the concepts hierarchy the percentage of the concepts of D2 generalized by the concepts of D1 and vice versa. This allows to infer a “specialization/generalization” relation between the two documents. Note that it is a global relation between two documents, but we could also envisage relations between parts of documents. In particular, our “specialization/generalization” relation excludes the fact that two documents generalize one another which could happen when they deal with different concepts. D1 could be a specialization of D2 for concept C1 and a generalization for concepts C2. However, we made this hypothesis for simplification. Instead of building a hierarchy of documents, we could use the “specialization/generalization” relation to indicate links between documents. Such links could also be built between the document segments identified during the first step of the algorithm. This would result into an alternative representation of the document collection. 4 OUR ALGORITHM AND SEMANTIC WEB This section will discuss how our algorithm answers some questions of Semantic Web research. The Semantic Web research can be view as an attempt to address the problem of information access by building programs that help users to locate, collect, and compare documents contents. In this point of view our structuring problems: algorithm addresses some of these �� The set of themes extracted from the collection, where each theme has a label in natural language, is a good synthesis of the collection for the user �� If one of the themes interests the user, he has all the documents treating the theme in the hierarchy node and each document has an annotation which reflects all the themes in it. This allows the user to target the subset of document likely to interest him. �� All documents are annotated by the themes they are about. The annotations are in natural
Page 1 and 2:
www.dbeBooks.com - An Ebook Library
Page 3 and 4:
A C.I.P. Catalogue record for this
Page 5 and 6:
vi TABLE OF CONTENTS PART 1 - DATAB
Page 7 and 8:
viii TABLE OF CONTENTS A WIRELESS A
Page 9 and 10:
CONFERENCE COMMITTEE Honorary Presi
Page 11 and 12:
Hung, P. (AUSTRALIA) Jahankhani, H.
Page 13 and 14:
Invited Papers
Page 15 and 16:
2 ENTERPRISE INFORMATION SYSTEMS VI
Page 17 and 18:
Page 19 and 20:
Page 21 and 22:
Page 23 and 24:
10 ENTERPRISE INFORMATION SYSTEMS V
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
EVOLUTIONARY PROJECT MANAGEMENT: MU
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
MANAGING COMPLEXITY OF ENTERPRISE I
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
Page 63 and 64:
Page 65 and 66:
Page 67 and 68:
ASSESSING EFFORT PREDICTION MODELS
Page 69 and 70:
Page 71 and 72:
Page 73 and 74:
Page 75 and 76:
ORGANIZATIONAL AND TECHNOLOGICAL CR
Page 77 and 78:
Page 79 and 80:
ASAP Processes W Project Kickoff A
Page 81 and 82:
Page 83 and 84:
since it means changing the relevan
Page 85 and 86:
ACME-DB: AN ADAPTIVE CACHING MECHAN
Page 87 and 88:
ACME-DB: AN ADAPTIVE CACHING MECHAN
Page 89 and 90:
Hit Rate (%) ACME-DB: AN ADAPTIVE C
Page 91 and 92:
5.3.6 Effect on all Policies by Int
Page 93 and 94:
ACKNOWLEDGEMENTS We would like to t
Page 95 and 96:
This paper describes the data sampl
Page 97 and 98:
The major limit A of the operation
Page 99 and 100:
RELATIONAL SAMPLING FOR DATA QUALIT
Page 101 and 102:
TOWARDS DESIGN RATIONALES OF SOFTWA
Page 103 and 104:
((Brown et al., 1998), pp. 159-190,
Page 105 and 106: sive data filtering, presentation (
Page 107 and 108: • implementation changes in servi
Page 109 and 110: MEMORY MANAGEMENT FOR LARGE SCALE D
Page 111 and 112: Data Rate [bytes/sec] 1600000 14000
Page 113 and 114: E.g., DV Camcorder Camera Packets (
Page 115 and 116: Procedure CheckOptimal (n rs 1 ...n
Page 117 and 118: MEMORY MANAGEMENT FOR LARGE SCALE D
Page 119 and 120: PART 2 Artificial Intelligence and
Page 121 and 122: 110 ENTERPRISE INFORMATION SYSTEMS
Page 127 and 128: DYNAMIC MULTI-AGENT BASED VARIETY F
Page 155: 144 ENTERPRISE INFORMATION SYSTEMS
Page 169 and 170: AN EXPERIENCE IN MANAGEMENT OF IMPR
Page 179 and 180: ANALYSIS AND RE-ENGINEERING OF WEB
Page 181 and 182: Module p0 Customer Itinerary Send I
Page 183 and 184: departments: the marketing dept. se
Page 185 and 186: • An acyclic module M is usable,
Page 187 and 188: BALANCING STAKEHOLDER’S PREFERENC
Page 189 and 190: The scenarios will provide an abstr
Page 191 and 192: BALANCING STAKEHOLDER'S PREFERENCES
Page 193 and 194: BALANCING STAKEHOLDER'S PREFERENCES
Page 195 and 196: A POLYMORPHIC CONTEXT FRAME TO SUPP
Page 197 and 198: A POLYMORPHIC CONTE XT FRAME TO SUP
Page 203 and 204: FEATURE MATCHING IN MODEL-BASED SOF
Page 205 and 206: combination of solution domains - a
Page 207 and 208:
• features for all the concepts:
Page 209 and 210:
Existence In both steps it is possi
Page 211 and 212:
matching strategies are maximal, mi
Page 213 and 214:
TOWARDS A META MODEL FOR DESCRIBING
Page 215 and 216:
elementary message (ae-message) can
Page 217 and 218:
problems on a pragmatic level, whic
Page 219 and 220:
TOWARDS A META MODEL FOR DESCRIBING
Page 221 and 222:
INTRUSION DETECTION SYSTEMS USING A
Page 223 and 224:
INTRUSION DETECTION SYSTEMS USING A
Page 225 and 226:
(k� 1) (k) (k�1) (k) p � w
Page 227 and 228:
Table 5: Performance Comparison of
Page 229 and 230:
A USER-CENTERED METHODOLOGY TO GENE
Page 231 and 232:
carries out the second phase of the
Page 233 and 234:
1 1 1 1 2 PROCESS STORE ENTITY EDGE
Page 235 and 236:
constraints rules. KOGGE (Ebert et
Page 237 and 238:
PART 4 Software Agents and Internet
Page 239 and 240:
230 ENTERPRISE INFORMATION SYSTEMS
Page 241 and 242:
Page 243 and 244:
Page 245 and 246:
Page 247 and 248:
Page 249 and 250:
Page 251 and 252:
Page 253 and 254:
Page 255 and 256:
Page 257 and 258:
Page 259 and 260:
Page 261 and 262:
Page 263 and 264:
Page 265 and 266:
Page 267 and 268:
Page 269 and 270:
Page 271 and 272:
Page 273 and 274:
Page 275 and 276:
Page 277 and 278:
Page 279 and 280:
Page 281 and 282:
Page 283 and 284:
Page 285 and 286:
CABA 2 L A BLISS PREDICTIVE COMPOSI
Page 287 and 288:
y disables evidencing severe mental
Page 289 and 290:
Figure 4: Symbols emission in DAR-H
Page 291 and 292:
they evidence a generalization erro
Page 293 and 294:
A METHODOLOGY FOR INTERFACE DESIGN
Page 295 and 296:
and mobility loss coupled with redu
Page 297 and 298:
Tradeoff: The default input is poss
Page 299 and 300:
Sessions on the VABS which are held
Page 301 and 302:
A CONTACT RECOMMENDER SYSTEM FOR A
Page 303 and 304:
A CONTACT RECOMMENDER SYSTEM FOR A
Page 305 and 306:
- "Going to the sources". The user
Page 307 and 308:
Here are the matrix W and P for our
Page 309 and 310:
EMOTION SYNTHESIS IN VIRTUAL ENVIRO
Page 311 and 312:
theme turns out to be surprisingly
Page 313 and 314:
6 FROM FEATURES TO SYMBOLS 6.1 Face
Page 315 and 316:
sions are described by different em
Page 317 and 318:
Electronic Imaging 2000 Conference
Page 319 and 320:
to the accessibility problem for th
Page 321 and 322:
Figure 3: Hierarchical View of the
Page 323 and 324:
4 CONCLUSIONS AND FUTURE WORK On th
Page 325 and 326:
PERSONALISED RESOURCE DISCOVERY SEA
Page 327 and 328:
Page 329 and 330:
profile, the user defines his curre
Page 331 and 332:
Page 333 and 334:
Abdelkafi, N. .....................
show all

Back Room Front Room 2

Create successful ePaper yourself

Delete template?

Save as template?