18.02.2013 Views

knowledge · information · learning - Forschungszentrum L3S

knowledge · information · learning - Forschungszentrum L3S

knowledge · information · learning - Forschungszentrum L3S

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

42<br />

INFORMATION<br />

iSearch – Advanced Search Technology<br />

The Search Engine Designed Exclusively for You!<br />

Nowadays, search engines have become the<br />

de facto entry point to large volume <strong>information</strong><br />

resources, such as the WWW, enterprise<br />

web, and individual data storage. Search tech-<br />

Motivation<br />

With the increasing importance of search engines as the<br />

primary methods for accessing all types of content, <strong>L3S</strong><br />

Research Center is dedicated to developing innovative<br />

search technologies. Advanced search technologies involve<br />

issues from multiple areas: such as ranking, personalization,<br />

text mining and spam combating etc. Therefore, the<br />

iSearch project focuses on developing intelligent search<br />

technologies by collectively considering different dimensions<br />

related to the search service.<br />

Challenges<br />

Several important challenges, such as how to enable personalized<br />

<strong>information</strong> access, how to exploit novel social<br />

media data, and how to search semi-structured and structured<br />

data, have been considered in iSearch in the past<br />

years. In this year, new challenges have been targeted by<br />

iSearch and include:<br />

• Extracting true web content and filtering noise from web<br />

pages are beneficial for optimized web search. Effective<br />

approaches, which include heuristic models as well as<br />

statistics models, should be examined.<br />

• Compared with text based spam, image-based spam is<br />

even more difficult to combat. New characteristics of<br />

image spam should be studied.<br />

• The presence of noise can violate the modeling assumptions<br />

of recommendation systems. Collaborative<br />

filtering techniques should be attack resistant.<br />

FORSCHUNGSZENTRUM <strong>L3S</strong> <strong>L3S</strong> RESEARCH CENTER<br />

nologies are critical to the success of search<br />

engines. The iSearch project aims to develop<br />

novel search algorithms to optimize existing<br />

search engines from multiple dimensions<br />

The iSearch project will continue its work on<br />

creating innovative search algorithms and solutions,<br />

by building upon its existing core competence<br />

and maintaining <strong>L3S</strong> Research Center’s<br />

competitiveness.<br />

Highlights<br />

Work in iSearch has led to various innovative solutions and<br />

subsequently to various high quality publications at top<br />

rated international conferences. Highlights for the year<br />

2008 are as follows:<br />

• A new approach to segment HTML pages, building on<br />

methods from Quantitative Linguistics and strategies<br />

borrowed from the area of Computer Vision. The notion<br />

of text-density is used as a measure to identify the<br />

individual text segments of a web page, reducing the<br />

problem to solving a 1D-partitioning task.<br />

• A new collaborative algorithm based on SVD which is accurate<br />

as well as highly stable to shilling. This algorithm<br />

exploits previously established SVD based shilling detection<br />

algorithms, and combines it with SVD based-CF.<br />

• Two methods for detecting image-based spam have<br />

been developed. The first solution, which uses the visual

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!