26.04.2015 Views

Founders at Work.pdf

Founders at Work.pdf

Founders at Work.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Stephen Kaufer 363<br />

Kaufer: We tried different approaches. We tried to randomly crawl the Web<br />

and we thought, “How are we going to randomly select out of the billions of<br />

pages?” So we tried crawling from known travel hubs. We’d start from the<br />

Yahoo Travel directory and see where those sites led us. We tried to pick out<br />

good, interesting inform<strong>at</strong>ion and autom<strong>at</strong>ically c<strong>at</strong>egorize it. Th<strong>at</strong> didn’t work<br />

so well. Wh<strong>at</strong> we call the signal-to-noise r<strong>at</strong>io wasn’t good enough—meaning<br />

th<strong>at</strong>, when they got our results back, people wouldn’t say, “Oh yeah, th<strong>at</strong>’s wh<strong>at</strong><br />

I was looking for.”<br />

We ended up looking <strong>at</strong> all of the published sources of inform<strong>at</strong>ion—newspapers,<br />

magazines—and manually went through all the websites from all these<br />

places to find the ones th<strong>at</strong> had free access to the back issues of their travel articles.<br />

Then we hired people to read every single travel article we could find on<br />

the Net, and classify th<strong>at</strong> article into our d<strong>at</strong>abase, and write a one-line summary.<br />

It’s a fairly significant effort, and people th<strong>at</strong> we talked to said, “You’re<br />

nuts. You’ll never finish.” But if you actually do the m<strong>at</strong>h, you realize th<strong>at</strong> you<br />

can work through the backlog (it took us a couple of years, but it was only a<br />

couple of years) and then can stay current with wh<strong>at</strong>’s being published without<br />

too much of an effort.<br />

We take half an hour to read an article, on average, and we’ll tag th<strong>at</strong> article<br />

as being relevant to everything the article talks about. If the article is about<br />

Maui and things to do in Hawaii and these two resorts, whenever you’re searching<br />

for Maui or things to do in Hawaii or those two resorts, th<strong>at</strong> article will<br />

come up. If th<strong>at</strong> article happened to mention, “The beaches in Maui are much<br />

better than the beaches in Fort Lauderdale,” and you were to search on the<br />

beaches in Fort Lauderdale, th<strong>at</strong> article is not going to come up, because our<br />

search isn’t keyword-based. It doesn’t m<strong>at</strong>ter if the article happens to mention<br />

something; you only want to read the article if it’s actually giving you an opinion<br />

on the topic you’re researching.<br />

Wh<strong>at</strong> we ended up with was a much smaller d<strong>at</strong>abase as measured by the<br />

number of documents th<strong>at</strong> we’d indexed, but extremely, extremely relevant.<br />

You go to a page about Maui, and every article on th<strong>at</strong> page really is about<br />

Maui, sorted to a pretty good degree based upon which article most people<br />

would r<strong>at</strong>her read first. Would you r<strong>at</strong>her read an article th<strong>at</strong> has a paragraph<br />

about Maui in talking about fun beaches around the world, or an article all<br />

about beaches in Maui? Probably the l<strong>at</strong>ter, so th<strong>at</strong>’s why the article is sorted<br />

first. Your experience on TripAdvisor—again, this was initially, when we<br />

launched the site—was very fulfilling, because the inform<strong>at</strong>ion we found was<br />

always spot-on. We didn’t always have something, but wh<strong>at</strong> we had was always a<br />

m<strong>at</strong>ch.<br />

Jumping forward in time as the site grew, all of a sudden now those hundreds<br />

of thousands of articles are dwarfed by the user reviews th<strong>at</strong> our visitors<br />

have gener<strong>at</strong>ed. It’s fresher inform<strong>at</strong>ion and tends to be more detailed. To<br />

many people, it’s more reliable.<br />

There’s a whole other theoretical discussion of, “Would you r<strong>at</strong>her read a<br />

review about a hotel by Aunt Mary you’ve never heard of from Bloomingdale,<br />

Indiana, or from Frommer’s, the trusted guidebook brand?” And the follow-up

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!