徐亚波博士中山大学软件学院
徐亚波博士中山大学软件学院
徐亚波博士中山大学软件学院
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Search<br />
the Next
Google, 1998<br />
Baidu, 2000<br />
……<br />
PageRank<br />
<br />
WHAT IS NEXT<br />
2
Eric Schmidt, Google CEO<br />
200993<br />
<br />
3
离 <br />
Entering a query<br />
9 Seconds<br />
Selecting a result<br />
15 Seconds<br />
Selecting another result<br />
15 Seconds<br />
…….<br />
Network time<br />
400ms<br />
Serving results<br />
~300ms<br />
Network time<br />
400ms<br />
Read results<br />
<br />
4
Google<br />
Instant Search<br />
People type slows<br />
but read fast.<br />
Search when you<br />
type<br />
Save 2-5 seconds<br />
per search<br />
5
Technical Challenges<br />
Instant Search<br />
= Query Prediction + Continuous Search Results Updating<br />
= 57 ( 1 billion searches per day already)<br />
Google’s solution:<br />
New cache mechanisms<br />
Re-use of the search results among all requests<br />
Optimization of JavaScript on the client side<br />
6
Baidu‘s Answer<br />
+ 开 <br />
7
Baidu‘s Answer<br />
+ 开 <br />
8
9
Entering a query<br />
9 Seconds<br />
Selecting a result<br />
15 Seconds<br />
Selecting another result<br />
15 Seconds<br />
…….<br />
Google<br />
Instant<br />
Search<br />
<br />
<br />
Baidu<br />
<br />
<br />
哪 <br />
10
Entering a query<br />
9 Seconds<br />
Selecting a result<br />
15 Seconds<br />
Selecting another result<br />
15 Seconds<br />
…….<br />
<br />
<br />
11
么 <br />
<br />
<br />
<br />
关 ,<br />
<br />
12
2005Simon Fraser University<br />
2007, <br />
2008<br />
.<br />
2010.<br />
13
Wiki<br />
关 <br />
<br />
<br />
<br />
(2007)<br />
(2007)<br />
14
(2009)<br />
15
关 <br />
Goo5.cn (2010 )<br />
16
Architecture<br />
Offline<br />
Online<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
Query<br />
<br />
<br />
<br />
关 <br />
<br />
<br />
17
Technical Challenges<br />
<br />
<br />
<br />
18
US 12/253,949<br />
<br />
200810170855.3<br />
“Method and System for A Web Search Engine<br />
Generating Summary-Style Search Results”<br />
<br />
<br />
<br />
19
Duplicate Check ( 复 )<br />
Language detection ()<br />
Sentence Detection()<br />
MapReduce<br />
<br />
Segmentation()<br />
Part of Speech Tagging()<br />
Chunking ()<br />
Map <br />
Reduce <br />
Entity Extraction ()<br />
Entity Mapping ()<br />
Relation/Event Extraction ( 关 )<br />
…..<br />
<br />
<br />
<br />
关 <br />
20
Term<br />
Postings<br />
<br />
aid 4 8<br />
all 2 4 6<br />
camera 1 3 7<br />
brown 1 3 5 7<br />
come 2 4 6 8<br />
soccer 3 5<br />
Traditional inverted<br />
index<br />
<br />
, <br />
<br />
<br />
<br />
***<br />
###<br />
###<br />
$$$<br />
<br />
<br />
<br />
<br />
21
query<br />
/<br />
关 <br />
<br />
<br />
<br />
MapReduce<br />
Sentence clustering)<br />
Sentence ordering ( )<br />
Paragraph Generation( <br />
<br />
Wiki<br />
<br />
22
关 <br />
query<br />
/<br />
关 <br />
: <br />
<br />
<br />
<br />
<br />
<br />
Object Clustering ()<br />
Feature Mapping ( )<br />
Summary Generation( <br />
<br />
<br />
<br />
23
关 <br />
2006DUC34<br />
, <br />
70-80%<br />
<br />
%<br />
<br />
确 90%<br />
78%<br />
确 <br />
24
Intelligent Information Processing & Cloud Computing Lab<br />
<br />
<br />
挖 <br />
Web 挖 <br />
刘 <br />
<br />
<br />
阳 圣 <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
Goo5<br />
帮 <br />
<br />
<br />
<br />
<br />
<br />
Hadoop<br />
<br />
<br />
>2M<br />
>4M<br />
>M<br />
>20M 25
System Demo<br />
26
27
28
关 <br />
29
关<br />
<br />
<br />
<br />
30
iSimilar <br />
31
32