13.07.2015 Views

Software Engineering for Internet Applications - Student Community

Software Engineering for Internet Applications - Student Community

Software Engineering for Internet Applications - Student Community

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

the <strong>Internet</strong>-specific problem of noise. Traditional in<strong>for</strong>mation retrievalsystems are designed to work with authoritative documents, e.g., theEncyclopedia Britannica, a binder of corporate policies, or the designnotes <strong>for</strong> a jetliner. The documents in the corpus are presumed to beauthoritative. There won't be four different answers, three of them flatwrong, to a question such as "In what year was Gioacchino Rossiniborn?", "How many signatures are required <strong>for</strong> a purchase of$57,300?", or "How wide is the wingspan of the airplane?" With userauthoredcontent in an online community, however, it seems safe toassume that while the average answer is likely to be correct, <strong>for</strong>every 100 correct answers there will be at least 3 or 4 incorrect ones.Even when the data require no interpretation there will be typos. Forexample, a Google search <strong>for</strong> "rossini 1792-1868" returned 10,600documents in April 2002; a search <strong>for</strong> "rossini 1792-1869" returned14 documents. A question answering system built on top of lightlymoderated user-authored content will have to exercise the same sortof judgement as do humans: how many documents contain Answer Aversus Answer B? what is the relative authority of conflictingdocuments? which of two conflicting documents is more recent?Mobile <strong>Internet</strong> devices put an even greater stress on in<strong>for</strong>mationretrieval. Connection speeds are slower. Screens are smaller. It isn'tpractical <strong>for</strong> a user to drill down into 20 documents returned by asearch engine as possibly relevant to a query, especially if the user isdriving a car and using a voice browser.If you want to emerge as a hero from the dust of the next <strong>Internet</strong>collapse, work on in<strong>for</strong>mation retrieval.12.14 More242• http://otn.oracle.com/products/text/, technical overviews <strong>for</strong>Oracle Text• trec.nist.gov, <strong>for</strong> the proceedings of the Text REtrievalConferences (TREC)12.15 Time and MotionThe two client interviews, at the beginning of the exercises and againat the end, should each take under an hour.The search design and documentation should be a team ef<strong>for</strong>t, andtake one to two hours.1. Joe User can transactionally sign up to write "Platinumprints", thus marking the article "assignment requestedpending editorial approval", supplying a brief outline andcommitting to completing a draft by July 1.2. Jane Editor can approve the outline and schedule, thusgenerating an email alert back to Joe.3. Joe User gets periodic email reminders of what he hassigned up to do and by when.4. Jane Editor is alerted when Joe's first draft is submitted onJuly 17 (Joe is unlikely to be the first author in the history ofthe world to submit work on time).5. Joe User gets an email alert asking him to review Jane'scorrected version and sign off his approval.6. The platinum printing article shows up at the top of JaneEditor's workspace page as "signed off by author" and sheclicks to push it live.Notice the intricacies of the workflow and also the idiosyncracies. TheNew York Times and the Boston Globe put out very similar-lookingproducts. They are owned by the same corporation. What do youthink the chances are that software that supports one newspaper'sworkflow will be adequate to support the other's?6.10 Exercise 2Lay out the workflow <strong>for</strong> each content item that will be user-visible inyour online learning community. For each workflow step specify (1)who needs to give approval, (2) what email alerts are generated, (3)what happens if approval is given, and (4) what happens if approvalis denied.Tip: we recommend modeling workflow as a finite-state machine inwhich a content item can be in only one state at a time and thatsingle state tells you everything that you need to know about theitem. In other words, your software can take action without everneeding to go back and look to see what states the article waspreviously in.6.11 Version Control (<strong>for</strong> Content)Anyone involved in the administration and editing of an onlinelearning community ought to be able to fetch an old version of acontent item. If an author complains that a paragraph was dropped,the editors should be able to retrieve the first draft of the article fromthe content management system. Old versions are sometimes useful<strong>for</strong> public users as well. For example, on photo.net in the mid-1990s107

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!