Proceedings

More documents

Recommendations

Info

| 126 Abstract Developing Intelligent Search Engines Developers of search engines today do not only face technical problems such as designing an efficient crawler or distributing search requests among servers. Search has become a problem of identifying reliable information in an adversarial environment. Since the web is used for purposes as diverse as trade, communication, and advertisement search engines need to be able to distinguish different types of web pages. In this paper we describe some common properties of the WWW and social networks. We show one possibility of exploiting these properties for classifying web pages. 1 Introduction Since more and more data is made available online, users have to search an ever growing amount of web pages to find the information they seek. In the last decade search engines have become an important tool to find valuable web sites for a given query. First search engines did rely on simply computing the similarity of query and page content to find the most relevant sites. Today, most engines incorporate some external relevance meassure, like the page rank [23], to determine the correct ranking of web pages. Intuitively, each web page gets some initial “page rank”, collects additional weight via its inlinks and evenly distributes the gathered rank to all pages it links to. Thus, pages that have either many inlinks from unimportant pages or at least some links from sites already considered important are ranked high. Nowadays the WWW is not only used to publish information or research results as was done in its very beginning. Many web pages we encounter are created to communicate, trade, organise events or to promote products. The question arises whether it is possible to identify certain types of web pages automatically and augment the search results with Isabel Drost 1 this information. In this paper we investigate certain properties of the web graph that can also be found in social networks. These properties distinguish natural occuring graphs from synthetic ones and can for instance be used to identify link spam [14]. The paper gives a short overview of how the link graph can be used to distinguish certain types of web pages with machine learning techniques. Basic notation conventions are introduced in chapter 2. An overview of different link graph properties is given in 3, chapter 4 deals with the classification of different web page types. Open problems are presented in chapter 5. 2 Notation The world wide web can be represented as a graph G = V,E. Each page corresponds to one node (also referred to as vertice) vi in the graph. Each link eij from page i to j is represented as an edge. The outdegree of vi corresponds to the number of links originating from this node (outlinks), its indegree to the number of links pointing to this page (inlinks). We refer to pages linking to vi, as well as those linked by vi by the term link neighborhood. 3 WWW as Social Network Social networks such as graphs representing relationships between humans or biological constraints can be shown to differ from synthetic networks in many properties [18, 22, 24]. In the following we shed light on a selection of differences that can be exploited to distinguish different types of web pages.
DEVELOPING INTELLIGENT SEARCH ENGINES 3.1 Power Law Distribution The distribution of the nodes’ indegree in social networks is power law governed [12, 2]. Intuitively speaking this means, that a large proportion of nodes have few links pointing to them whereas there are only a few pages that attract large amounts of links. The indegree distribution of web pages should also be exhibit a power law. Yet in [12] empirical studies have shown that the actual distribution has several outliers. Examining this problem more deeply, the authors found the outliers being link spam in most of the cases. They concluded that this observation might help in designing a link spam classifier. 3.2 Clustering Coefficient Given a node xi in an undirected graph, the clustering coefficient of this node gives the proportion of existing links among its neighbors vs. all links that theoretically could exist. In [18] Newman observed that this coefficient is higher in naturally occuring networks than in synthetic networks: Two nodes both linked to a third one are also linked to each other with high probability in naturally occuring networks. For the WWW no clear decission could be made on whether its average clustering coefficient is similar to the one in natural networks. In our work however [10] we observed that the local clustering coefficient can be exploited successfully to distinguish spam from ham web pages. The local clustering coefficient simply gives the probability of a link between two randomly drawn link neighbors of one web page. 3.3 Small World Graphs The most commonly known property of small world graphs is, that the shortest path connecting two nodes is considerably smaller than in random graphs. The concept became widely known after Milgrams publication [20] that suggested that US citizens are on average connected to each other by a path of 6 intermediate nodes. The WWW also should reveal such properties. But what we observe in reality is a deviation from these statistics: Large networks of link spam de- 2 teriorate the small world properties of the WWW [7, 6]. 4 Classifying Web Pages In this section we treat the problem of classifying web pages. We incorporate the local link structure of the example nodes into their feature representation. To validate the expressivity of this representation we apply this strategy for classifying link spam. 4.1 Representing Examples Each web page is represented by three feature sets. The first one corresponds to intrinsic properties such as the length of the example page to classify. The second set covers averaged and summed features of the link neighborhood, such as the average length of pages linking to the example to classify. The last set of features covers the relations among neighboring pages and the example such as the average number of pages among the inlink pages with same length as the example vertice. A detailed description of the features used is given in table 1. Many of these features are reimplementations of, or have been inspired by, features suggested by [9, 10] and [11]. 4.2 Classifying Web Spam In our experiments, we study learning curves for the tfidf representation, the attributes of Table 1, and the joint features. Figure 1 shows that for all spam, combined features are superior to the tfidf representation. In [10], a ranking of features according to their importance for classification is given. The most important features all cover attributes of either the neighboring pages themselves as well as context similarity features. Also the clustering coefficient of the pages to classify ranges rather high in the feature ranking. This publication also examines the behavior of the classifier in an adversarial environment: Spammers adopt the structure of their web pages as soon as a new spam classifier is employed. The experiments show that our link spam classifier needs to be retrained quickly, in case spammers start to adopt their web pages. 127 |
Page 2 and 3:
| 2
Page 5 and 6:
22. Chaos Communication Congress Pr
Page 7 and 8:
REPORTS 7 5 Thesis on Informational
Page 9 and 10:
10 5 Thesis on Informational- Cogni
Page 11 and 12:
5 THESIS ON INFORMATIONAL-COGNITIVE
Page 13 and 14:
Page 15 and 16:
Page 17:
Page 20 and 21:
UQBTng: a tool capable of automatic
Page 22 and 23:
IV. UQBT MODIFICATIONS A. Summary A
Page 24 and 25:
• if LHS of A is exactly sem str,
Page 27 and 28:
Advanced Buffer Overflow Methods [o
Page 29 and 30:
ADVANCED BUFFER OVERFLOW METHODS #i
Page 31 and 32:
ADVANCED BUFFER OVERFLOW METHODS 0x
Page 33 and 34:
ADVANCED BUFFER OVERFLOW METHODS in
Page 35:
ADVANCED BUFFER OVERFLOW METHODS }
Page 38 and 39:
| 34 Eine Führung in das europäis
Page 40 and 41:
| 36 Die kryptische Kennung 10 eine
Page 42 and 43:
| 38 senvertreter angewiesen. Die
Page 44 and 45:
| 40 Anonymous Data Broadcasting by
Page 46 and 47:
| 42 operate, consider the setting,
Page 48 and 49:
| 44 Figure 1: Anonymous data broad
Page 51 and 52:
Autodafé: An Act of Software Tortu
Page 53 and 54:
AUTODAFE - AN ACT OF SOFTWARE TORTU
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Page 61 and 62:
Bad TRIPs What the WTO Treaty did i
Page 63 and 64:
BAD TRIPS The TRIPS is especially h
Page 65:
BAD TRIPS Extension of the transiti
Page 68 and 69:
| 64 Collateral Damage Consequences
Page 70 and 71:
| 66 If one considers a DNSBL listi
Page 72 and 73:
| 68 • A number of sites do not h
Page 74 and 75:
| 70 to use other mailing software.
Page 76 and 77:
| 72 These kinds of checks tend to
Page 79 and 80: COMPLETE Hard Disk Encryption with
Page 81 and 82: COMPLETE HARD DISK ENCRYPTION WITH
Page 105: COMPLETE HARD DISK ENCRYPTION WITH
Page 108 and 109: | 104 x86-64 buffer overflow exploi
Page 110 and 111: | 106 3 ELF64 LAYOUT AND X86-64 EXE
Page 112 and 113: | 108 4 THE BORROWED CODE CHUNKS TE
Page 114 and 115: | 110 5 AND DOES THIS REALLY WORK?
Page 116 and 117: | 112 6 SINGLE WRITE EXPLOITS 9 48
Page 118 and 119: | 114 6 SINGLE WRITE EXPLOITS 11 34
Page 120 and 121: | 116 7 AUTOMATED EXPLOITATION 13 A
Page 122 and 123: | 118 7 AUTOMATED EXPLOITATION 15 W
Page 124 and 125: | 120 8 RELATED WORK 17 32 read(soc
Page 126 and 127: | 122 11 CREDITS 19 11 Credits Than
Page 129: Developing Intelligent Search Engin
Page 133 and 134: DEVELOPING INTELLIGENT SEARCH ENGIN
Page 135 and 136: Digital Identity and the Ghost in t
Page 137 and 138: DIGITAL IDENTITY AND THE GHOST IN T
Page 145: DIGITAL IDENTITY AND THE GHOST IN T
Page 148 and 149: | 144
Page 150 and 151: | 146
Page 153 and 154: Esperanto, die internationale Sprac
Page 155 and 156: ESPERANTO, DIE INTERNATIONALE SPRAC
Page 157 and 158: ESPERANTO, DIE INTERNATIONALE SPRAC
Page 159 and 160: Fair Code Free/Open Source Software
Page 161 and 162: FAIR CODE 2. Von Digital Divide zu
Page 163 and 164: FAIR CODE ökonomisch schlecht gest
Page 165 and 166: FAIR CODE menschliches Verhalten re
Page 167: FAIR CODE gewährleistet, und nur d
Page 170 and 171: ��
Page 172 and 173: | 168 ��
Page 174 and 175: | 170 ��
Page 176 and 177: | 172 ��
Page 178 and 179: | 174 ��
Page 180 and 181:
| 176 ��
Page 183 and 184:
Fuzzing Breaking software in an aut
Page 185 and 186:
FUZZING or protocol and fuzzing fra
Page 187 and 188:
FUZZING #define SOME_MAX 4096 if (y
Page 189 and 190:
Geometrie ohne Punkte, Geraden & Eb
Page 191 and 192:
GEOMETRIE OHNE PUNKTE, GERADEN & EB
Page 193:
GEOMETRIE OHNE PUNKTE, GERADEN & EB
Page 196 and 197:
| 192
Page 198 and 199:
| 194
Page 200 and 201:
| 196
Page 203 and 204:
Hacking OpenWRT Felix Fietkau 199 |
Page 205 and 206:
HACKING OPENWRT 1 Introduction to O
Page 207 and 208:
HACKING OPENWRT 3.2 Makefile This f
Page 209 and 210:
HACKING OPENWRT If your package is
Page 211 and 212:
HACKING OPENWRT 4.2 toolchain/ tool
Page 213 and 214:
HACKING OPENWRT To add a new platfo
Page 215 and 216:
Hopalong Casualty On automated vide
Page 217 and 218:
HOPALONG CASUALTY 2 Intro to Visual
Page 219 and 220:
HOPALONG CASUALTY Appearance As hum
Page 221 and 222:
HOPALONG CASUALTY A thorough review
Page 223 and 224:
Hosting a Hacking Challenge - CTF-s
Page 225 and 226:
HOSTING A HACKING CHALLENGE - CTF-S
Page 227 and 228:
Page 229:
Page 232 and 233:
| 228 1 Introduction Intrusion Dete
Page 234 and 235:
| 230 • Changing file owner / gro
Page 236 and 237:
| 232 Figure 2: Simple correlation
Page 239 and 240:
Lyrical I Abschluss des CCC-Poesie-
Page 241 and 242:
Magnetic Stripe Technology Joseph B
Page 243 and 244:
MAGNETIC STRIPE TECHNOLOGY black re
Page 245 and 246:
MAGNETIC STRIPE TECHNOLOGY My curre
Page 247 and 248:
MAGNETIC STRIPE TECHNOLOGY MetroCar
Page 249 and 250:
��
Page 251 and 252:
Code Listing (dmsb.c) ��
Page 253 and 254:
Memory allocator security Yves Youn
Page 255 and 256:
MEMORY ALLOCATOR SECURITY 251 |
Page 257 and 258:
MEMORY ALLOCATOR SECURITY 253 |
Page 259 and 260:
Message generation at the info laye
Page 261 and 262:
MESSAGE GENERATION AT THE INFO LAYE
Page 263 and 264:
Page 265 and 266:
Page 267 and 268:
Page 269:
Page 272 and 273:
| 268 Abstract Open Source, EU Fund
Page 274 and 275:
| 270 lines of code x1e5 1.8 1.6 1.
Page 276 and 277:
| 272 cooperation. Getting the new
Page 278 and 279:
| 274 PyPy - The new Python Impleme
Page 280 and 281:
| 276 performs abstract interpretat
Page 282 and 283:
| 278 adding short cuts for some co
Page 284 and 285:
| 280
Page 286 and 287:
| 282
Page 288 and 289:
| 284
Page 290 and 291:
| 286
Page 292 and 293:
| 288
Page 294 and 295:
| 290
Page 296 and 297:
| 292
Page 298 and 299:
| 294 Die Risiken der Nanotechnik N
Page 300 and 301:
| 296 Neue Materialien, einer der P
Page 302 and 303:
| 298 wahrscheinlich nicht ausreich
Page 304 and 305:
| 300 „State of the Future 2005
Page 307 and 308:
The Web according to W3C How to tur
Page 309 and 310:
THE WEB ACCORDING TO W3C 305 |
Page 311 and 312:
THE WEB ACCORDING TO W3C 307 |
Page 313:
THE WEB ACCORDING TO W3C our proces
Page 316 and 317:
| 312 Transparenz der Verantwortung
Page 318 and 319:
| 314 Transparenz der Verantwortung
Page 320 and 321:
| 316 (6) Wird dem Betroffenen kein
Page 322 and 323:
| 318 „anonymes Staatshandeln“
Page 324 and 325:
| 320 Die Einschränkungen der Grun
Page 326 and 327:
| 322 11 wie es insbesondere an Hei
Page 328 and 329:
��
Page 330 and 331:
��
Page 332 and 333:
| 328 ��
Page 335 and 336:
Wargames - Hacker Spielen Männlich
Page 337 and 338:
WARGAMES - HACKER SPIELEN 333 |
Page 339 and 340:
Page 341 and 342:
Page 343 and 344:
Page 345 and 346:
Page 347 and 348:
Page 349 and 350:
Page 351 and 352:
Page 353 and 354:
Page 355 and 356:
Was ist technisches Wissen? Philoso
Page 357 and 358:
WAS IST TECHNISCHES WISSEN? ��
Page 359 and 360:
Page 361 and 362:
Page 363 and 364:
Page 365 and 366:
Page 367 and 368:
��
Page 369 and 370:
��
Page 371 and 372:
Page 373:
Page 376 and 377:
| 372 ��
Page 378 and 379:
��
Page 380 and 381:
��
Page 382 and 383:
| 378
Page 384 and 385:
| 380
Page 386 and 387:
| 382
Page 388 and 389:
| 384
Page 390 and 391:
| 386
Page 392 and 393:
| 388
Page 394:
| 390
show all

Proceedings

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?