01.01.2015 Views

徐亚波博士中山大学软件学院

徐亚波博士中山大学软件学院

徐亚波博士中山大学软件学院

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Search<br />

the Next


Google, 1998<br />

Baidu, 2000<br />

……<br />

PageRank<br />

<br />

WHAT IS NEXT<br />

2


Eric Schmidt, Google CEO<br />

200993<br />

<br />

3


离 <br />

Entering a query<br />

9 Seconds<br />

Selecting a result<br />

15 Seconds<br />

Selecting another result<br />

15 Seconds<br />

…….<br />

Network time<br />

400ms<br />

Serving results<br />

~300ms<br />

Network time<br />

400ms<br />

Read results<br />

<br />

4


Google<br />

Instant Search<br />

People type slows<br />

but read fast.<br />

Search when you<br />

type<br />

Save 2-5 seconds<br />

per search<br />

5


Technical Challenges<br />

Instant Search<br />

= Query Prediction + Continuous Search Results Updating<br />

= 57 ( 1 billion searches per day already)<br />

Google’s solution:<br />

New cache mechanisms<br />

Re-use of the search results among all requests<br />

Optimization of JavaScript on the client side<br />

6


Baidu‘s Answer<br />

+ 开 <br />

7


Baidu‘s Answer<br />

+ 开 <br />

8


9


Entering a query<br />

9 Seconds<br />

Selecting a result<br />

15 Seconds<br />

Selecting another result<br />

15 Seconds<br />

…….<br />

Google<br />

Instant<br />

Search<br />

<br />

<br />

Baidu<br />

<br />

<br />

哪 <br />

10


Entering a query<br />

9 Seconds<br />

Selecting a result<br />

15 Seconds<br />

Selecting another result<br />

15 Seconds<br />

…….<br />

<br />

<br />

11


么 <br />

<br />

<br />

<br />

关 ,<br />

<br />

12


2005Simon Fraser University<br />

2007, <br />

2008<br />

.<br />

2010.<br />

13


Wiki<br />

关 <br />

<br />

<br />

<br />

(2007)<br />

(2007)<br />

14


(2009)<br />

15


关 <br />

Goo5.cn (2010 )<br />

16


Architecture<br />

Offline<br />

Online<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

Query<br />

<br />

<br />

<br />

关 <br />

<br />

<br />

17


Technical Challenges<br />

<br />

<br />

<br />

18


US 12/253,949<br />

<br />

200810170855.3<br />

“Method and System for A Web Search Engine<br />

Generating Summary-Style Search Results”<br />

<br />

<br />

<br />

19


Duplicate Check ( 复 )<br />

Language detection ()<br />

Sentence Detection()<br />

MapReduce<br />

<br />

Segmentation()<br />

Part of Speech Tagging()<br />

Chunking ()<br />

Map
<br />

Reduce
<br />

Entity Extraction ()<br />

Entity Mapping ()<br />

Relation/Event Extraction ( 关 )<br />

…..<br />

<br />

<br />

<br />

关 <br />

20


Term<br />

Postings<br />

<br />

aid 4 8<br />

all 2 4 6<br />

camera 1 3 7<br />

brown 1 3 5 7<br />

come 2 4 6 8<br />

soccer 3 5<br />

Traditional inverted<br />

index<br />

<br />

, <br />

<br />

<br />

<br />

***<br />

###<br />

###<br />

$$$<br />

<br />

<br />

<br />

<br />

21


query<br />

/<br />

关 <br />

<br />

<br />

<br />

MapReduce<br />

Sentence clustering)<br />

Sentence ordering ( )<br />

Paragraph Generation( <br />

<br />

Wiki<br />

<br />

22


关 <br />

query<br />

/<br />

关 <br />

: <br />

<br />

<br />

<br />

<br />

<br />

Object Clustering ()<br />

Feature Mapping ( )<br />

Summary Generation( <br />

<br />

<br />

<br />

23


关 <br />

2006DUC34<br />

, <br />

70-80%<br />

<br />

%<br />

<br />

确 90%<br />

78%<br />

确 <br />

24


Intelligent Information Processing & Cloud Computing Lab<br />

<br />

<br />

挖 <br />

Web 挖 <br />

刘 <br />

<br />

<br />

阳 圣 <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

Goo5<br />

帮 <br />

<br />

<br />

<br />

<br />

<br />

Hadoop<br />

<br />

<br />

>2M<br />

>4M<br />

>M<br />

>20M 25


System Demo<br />

26


27


28


关 <br />

29


关<br />

<br />

<br />

<br />

30


iSimilar <br />

31


32

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!