01.01.2015 Views

徐亚波博士中山大学软件学院

徐亚波博士中山大学软件学院

徐亚波博士中山大学软件学院

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Search<br />

the Next


Google, 1998<br />

Baidu, 2000<br />

……<br />

PageRank<br />

<br />

WHAT IS NEXT<br />

2


Eric Schmidt, Google CEO<br />

200993<br />

<br />

3


离 <br />

Entering a query<br />

9 Seconds<br />

Selecting a result<br />

15 Seconds<br />

Selecting another result<br />

15 Seconds<br />

…….<br />

Network time<br />

400ms<br />

Serving results<br />

~300ms<br />

Network time<br />

400ms<br />

Read results<br />

<br />

4


Google<br />

Instant Search<br />

People type slows<br />

but read fast.<br />

Search when you<br />

type<br />

Save 2-5 seconds<br />

per search<br />

5


Technical Challenges<br />

Instant Search<br />

= Query Prediction + Continuous Search Results Updating<br />

= 57 ( 1 billion searches per day already)<br />

Google’s solution:<br />

New cache mechanisms<br />

Re-use of the search results among all requests<br />

Optimization of JavaScript on the client side<br />

6


Baidu‘s Answer<br />

+ 开 <br />

7


Baidu‘s Answer<br />

+ 开 <br />

8


9


Entering a query<br />

9 Seconds<br />

Selecting a result<br />

15 Seconds<br />

Selecting another result<br />

15 Seconds<br />

…….<br />

Google<br />

Instant<br />

Search<br />

<br />

<br />

Baidu<br />

<br />

<br />

哪 <br />

10


Entering a query<br />

9 Seconds<br />

Selecting a result<br />

15 Seconds<br />

Selecting another result<br />

15 Seconds<br />

…….<br />

<br />

<br />

11


么 <br />

<br />

<br />

<br />

关 ,<br />

<br />

12


2005Simon Fraser University<br />

2007, <br />

2008<br />

.<br />

2010.<br />

13


Wiki<br />

关 <br />

<br />

<br />

<br />

(2007)<br />

(2007)<br />

14


(2009)<br />

15


关 <br />

Goo5.cn (2010 )<br />

16


Architecture<br />

Offline<br />

Online<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

Query<br />

<br />

<br />

<br />

关 <br />

<br />

<br />

17


Technical Challenges<br />

<br />

<br />

<br />

18


US 12/253,949<br />

<br />

200810170855.3<br />

“Method and System for A Web Search Engine<br />

Generating Summary-Style Search Results”<br />

<br />

<br />

<br />

19


Duplicate Check ( 复 )<br />

Language detection ()<br />

Sentence Detection()<br />

MapReduce<br />

<br />

Segmentation()<br />

Part of Speech Tagging()<br />

Chunking ()<br />

Map
<br />

Reduce
<br />

Entity Extraction ()<br />

Entity Mapping ()<br />

Relation/Event Extraction ( 关 )<br />

…..<br />

<br />

<br />

<br />

关 <br />

20


Term<br />

Postings<br />

<br />

aid 4 8<br />

all 2 4 6<br />

camera 1 3 7<br />

brown 1 3 5 7<br />

come 2 4 6 8<br />

soccer 3 5<br />

Traditional inverted<br />

index<br />

<br />

, <br />

<br />

<br />

<br />

***<br />

###<br />

###<br />

$$$<br />

<br />

<br />

<br />

<br />

21


query<br />

/<br />

关 <br />

<br />

<br />

<br />

MapReduce<br />

Sentence clustering)<br />

Sentence ordering ( )<br />

Paragraph Generation( <br />

<br />

Wiki<br />

<br />

22


关 <br />

query<br />

/<br />

关 <br />

: <br />

<br />

<br />

<br />

<br />

<br />

Object Clustering ()<br />

Feature Mapping ( )<br />

Summary Generation( <br />

<br />

<br />

<br />

23


关 <br />

2006DUC34<br />

, <br />

70-80%<br />

<br />

%<br />

<br />

确 90%<br />

78%<br />

确 <br />

24


Intelligent Information Processing & Cloud Computing Lab<br />

<br />

<br />

挖 <br />

Web 挖 <br />

刘 <br />

<br />

<br />

阳 圣 <br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

Goo5<br />

帮 <br />

<br />

<br />

<br />

<br />

<br />

Hadoop<br />

<br />

<br />

>2M<br />

>4M<br />

>M<br />

>20M 25


System Demo<br />

26


27


28


关 <br />

29


关<br />

<br />

<br />

<br />

30


iSimilar <br />

31


32

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!