13.07.2015 Views

Software Engineering for Internet Applications - Student Community

Software Engineering for Internet Applications - Student Community

Software Engineering for Internet Applications - Student Community

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

we had a lot of classified ads whose subject lines were of the <strong>for</strong>m"Reduced to $395!" A check through the server logs revealed that thead had been posted earlier that day with a price of $400, then editeda few hours later. So technically the subject line was true but it wasmisleading. Instead of hiring additional administrators to notice thiskind of problem, we changed the software to store all previousversions of a classified ad. When presenting an ad that had beenedited, the new scripts offered a link to view old versions of the ad.The practice of screaming "Reduced!" stopped.Version control becomes critical <strong>for</strong> preventing lost updates whenpeople are working together. Here's how a lost update can happen:108• Ira grabs Version A of a document at 9:00 am from the Website in order to fix a typo. He fixes it at 9:01 am but <strong>for</strong>gets towrite the document back to the Web site.• Shoshana grabs Version A at 10:00 am and spends sixhours adding a chapter of text, writing it back at 4:00 pm(call this Version B).• Ira notices that he <strong>for</strong>got to write his typo correction back tothe server and does so at 5:00 pm (call this Version C).Un<strong>for</strong>tunately, Version C (the typo fix) is what future users will see; allof Shoshana's work was wasted.Programmers and technical writers at large companies are familiarwith the problem of lost updates when multiple people are editing thesame document. File-system based version control systems weredeveloped to help coordinate multiple contributors. These systemsinclude the original Walter Tichy's Revision Control System (RCS;early 1980s), Dick Grune and Brian Berliner's Concurrent VersionsSystem (CVS; 1986), and Marc Rochkind's Source Code ControlSystem (SCCS; 1972). These systems require more training than ispractical <strong>for</strong> casual users. For example, RCS mandates explicitcheck-out and check-in. While a file is checked out by User A it islocked and nobody but User A can check it back in. Suppose thatUser A goes out to lunch but there is some important news thatabsolutely must be put on the site. What if User A leaves <strong>for</strong> a twoweekvacation and <strong>for</strong>gets to check a bunch of files back in? Theseproblems can be worked around manually but it becomes a challengewhen the collaborators are on opposite sides of the globe and cannotsee "Oh, Schlomo's coat is still on the back of his chair so he's notyet left <strong>for</strong> the day."12.11 Exercise 6: robots.txtPlace a file on your server at /robots.txt that excludes robotsfrom appropriate portions of your server. Put some comments at thetop of the file explaining who created this, when it was created, andthe rationale behind the exclusions.If you're doing a 100 percent database-backed content managementsystem, you are free to put the content of the robots.txt file in theRDBMS, just so long as it is served when the URI /robots.txt isrequested.12.12 Exercise 7: Client SignoffReview the search facility, both user and admin pages, with yourclient. Write down your client's reaction to this new module, payingparticular attention to any new ideas that the client might have <strong>for</strong>what will be typical queries on the site.12.13 The FutureAs an online community grows older and larger it becomes ever morelikely that a user will be overwhelmed with "100,000 documentsmatched your query". When a community is new and small it ispossible to search <strong>for</strong> an answer merely by reading the titles ofeverything on the site, i.e., by browsing. As a community grows,there<strong>for</strong>e, the greater the importance of in<strong>for</strong>mation retrieval tools.The exercises in this chapter focus on answering a user's query bypresenting links to relevant documents. Suppose that we build asearch facility that always returns the very most relevant document inthe corpus. Is that an optimal solution? Only if you believe that userslike to read.Suppose that Joe User visits photo.net and types "At what shutterspeeds is a tripod required?" into the search box. Is it reasonable toassume that Joe wants to read a 10,000-word document thatcontains the answer to this question? Or would Joe rather get ... theanswer to his question. The answer "at shutter speeds slower than1/lens-focal-length" is a lot smaller and quicker to read than adocument containing this in<strong>for</strong>mation.To get a feel <strong>for</strong> how a question answering system can be built on topof a full-text indexer, read "Scaling Question Answering to the Web"(Cody Kwok, Oren Etzioni, Dan Weld; WWW10 conference, May2001; http://www.cs.washington.edu/homes/ctkwok/paper/), whichdescribes a system built at the University of Washington. This systemincludes all of the expected linguistic gymnastics plus code to sort out241

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!