knowledge · information · learning - Forschungszentrum L3S
knowledge · information · learning - Forschungszentrum L3S
knowledge · information · learning - Forschungszentrum L3S
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
42<br />
INFORMATION<br />
iSearch – Advanced Search Technology<br />
The Search Engine Designed Exclusively for You!<br />
Nowadays, search engines have become the<br />
de facto entry point to large volume <strong>information</strong><br />
resources, such as the WWW, enterprise<br />
web, and individual data storage. Search tech-<br />
Motivation<br />
With the increasing importance of search engines as the<br />
primary methods for accessing all types of content, <strong>L3S</strong><br />
Research Center is dedicated to developing innovative<br />
search technologies. Advanced search technologies involve<br />
issues from multiple areas: such as ranking, personalization,<br />
text mining and spam combating etc. Therefore, the<br />
iSearch project focuses on developing intelligent search<br />
technologies by collectively considering different dimensions<br />
related to the search service.<br />
Challenges<br />
Several important challenges, such as how to enable personalized<br />
<strong>information</strong> access, how to exploit novel social<br />
media data, and how to search semi-structured and structured<br />
data, have been considered in iSearch in the past<br />
years. In this year, new challenges have been targeted by<br />
iSearch and include:<br />
• Extracting true web content and filtering noise from web<br />
pages are beneficial for optimized web search. Effective<br />
approaches, which include heuristic models as well as<br />
statistics models, should be examined.<br />
• Compared with text based spam, image-based spam is<br />
even more difficult to combat. New characteristics of<br />
image spam should be studied.<br />
• The presence of noise can violate the modeling assumptions<br />
of recommendation systems. Collaborative<br />
filtering techniques should be attack resistant.<br />
FORSCHUNGSZENTRUM <strong>L3S</strong> <strong>L3S</strong> RESEARCH CENTER<br />
nologies are critical to the success of search<br />
engines. The iSearch project aims to develop<br />
novel search algorithms to optimize existing<br />
search engines from multiple dimensions<br />
The iSearch project will continue its work on<br />
creating innovative search algorithms and solutions,<br />
by building upon its existing core competence<br />
and maintaining <strong>L3S</strong> Research Center’s<br />
competitiveness.<br />
Highlights<br />
Work in iSearch has led to various innovative solutions and<br />
subsequently to various high quality publications at top<br />
rated international conferences. Highlights for the year<br />
2008 are as follows:<br />
• A new approach to segment HTML pages, building on<br />
methods from Quantitative Linguistics and strategies<br />
borrowed from the area of Computer Vision. The notion<br />
of text-density is used as a measure to identify the<br />
individual text segments of a web page, reducing the<br />
problem to solving a 1D-partitioning task.<br />
• A new collaborative algorithm based on SVD which is accurate<br />
as well as highly stable to shilling. This algorithm<br />
exploits previously established SVD based shilling detection<br />
algorithms, and combines it with SVD based-CF.<br />
• Two methods for detecting image-based spam have<br />
been developed. The first solution, which uses the visual