13.07.2015 Views

Software Engineering for Internet Applications - Student Community

Software Engineering for Internet Applications - Student Community

Software Engineering for Internet Applications - Student Community

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

15.18 Time and MotionThe team should work together with the client to develop theontology. These discussions and the initial documentation shouldrequire 2 to 3 hours. Designing the metadata data model may be asimple copy/paste operation <strong>for</strong> teams building with Oracle, but inany case should require no more than an hour. Generating the DDLstatements and drop tables script should take about two hours ofwork by one programmer. Building out the system pages, Exercise 5through 10, should require 8 to 12 programmer-hours. This part canbe divided to an extent but it's probably best to limit the programmingto two individuals working together closely since the exercises buildupon one another. Finally, the writeups at the end should take one totwo hours total.#7241 really liked Article #2451" opens up interesting possibilities <strong>for</strong>personalization.Consider a corporate knowledge management system. At thebeginning the database is empty and there are only a few users.Scanning the titles of all contributed content would take only a fewminutes. After 5 years, however, the database contains 100,000documents and the 10,000 active users are contributing severalhundred new documents every day (keep in mind that a question oranswer in a discussion <strong>for</strong>um is a "document" <strong>for</strong> the purpose of thisdiscussion). If Jane User wants to see what her coworkers have beenup to in the last 24 hours, it might take her 30 minutes to scan thetitles of the new content. Jane User may well abandon an onlinelearning community that, when smaller, was very useful to her.Suppose now that the database contains 100 entries of the <strong>for</strong>m"Jane liked this article" and 100 entries of the <strong>for</strong>m "Jane did not likethis article". Be<strong>for</strong>e Jane has arrived at work, a batch job cancompare every new article in the system to the 100 articles that Janeliked and the 100 articles that Jane said she did not like. Thiscomparison can be done using most standard full-text searchsoftware, which will take two documents and score them <strong>for</strong> similaritybased on words used. Each new document is given a score of the<strong>for</strong>mavg(similarity(:new_doc,all_docs_marked_as_liked_by_user(:user_id)))-avg(similarity(:new_doc,all_docs_marked_as_disliked_by_user(:user_id)))The new documents are then presented to Jane ranked bydescending score. If you're an Intel stockholder you'll be pleased toconsider the computational implications of this personalizationscheme. Every new document must be compared to every documentpreviously marked by a user. Perhaps that is 200 comparisons. Ifthere are 10,000 users, this scoring operation must be repeated10,000 times. So that is 2,000,000 comparisons per day per newdocument in the system. Full-text comparisons generally are quiteslow as they rely on lookup up each word in a document to find itsoccurrence frequency in standard written English. A comparison oftwo documents can take 1/10th of CPU time. We're thus looking atabout 200,000 seconds of CPU time per new document added to thesystem, plus the insertion of 10,000 rows in the database, each rowcontaining the personalization score of that document <strong>for</strong> a particular28465

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!