• User #21 contributed Comment #37 on Article #529• User #192 asked Question #512• User #451 posted Answer #3 to Question #924• User #1392 has read Article #456• User #8923 is interested in being alerted when a change ismade to Article #223• User #8923 is interested in being alerted when an answer toQuestion #9213 is postedWe are careful to record authorship because attributed contentcontributes to our chances of building a real community. To offerusers the service of email notifications when someone responds to aquestion or comments on an article, it is necessary to recordauthorship.Why record the fact that a particular user has read, or at leastdownloaded, a particular document? Consider an online learningcommunity of professors and students at a university. It is necessaryto record readership if one wishes to write a robot that sends outmessages like the following:To: Sam <strong>Student</strong>From: <strong>Community</strong> Nag RobotDate: Friday, 4:30 pmSubject: Your Lazy BonesSam,I notice that you have four assignments due onMonday and that youhave not even looked at two of them. I hope thatyou aren't planningto go to a fraternity party tonight instead ofstudying.Very truly yours,Some SQL CodeOnce an online learning community is recording the act ofreadership, it is natural to consider recording whether or not the actof reading proved worthwhile. In general collaborative filtering is thelast refuge of those too cowardly to edit. However, recording "UserChapter 16User Activity AnalysisThis chapter looks at ways that you can monitor user activity withinyour community and how that in<strong>for</strong>mation can be used to personalizea user's experience.16.1 Step 1: Ask the Right QuestionsBe<strong>for</strong>e considering what is technically feasible, it is best to start witha wishlist of the questions about user activity that have relevance <strong>for</strong>your client's application. Here are some starter questions:• What are the URLs that are producing server errors?[answer leads to action: fix broken code]• How many users requested non-existent files and where didthey get the bad URLs? [answer leads to action: fix badlinks]• Are at least 50 percent of users visiting /foobar/, ournewest and most important section? [answer leads toaction: maybe add more pointers to the new section fromother areas of the site]• How popular are the voice and wireless interfaces to theapplication? [answer leads to action: invest more ef<strong>for</strong>t inpopular interfaces]• Which pages are causing users to get stuck and abandontheir sessions? I.e., what are the typical last pages viewedbe<strong>for</strong>e a user disappears <strong>for</strong> the day? [answer leads toaction: clarify user interface or annotation on those pages]• Suppose that we operate an ecommerce site and that we'vepurchased advertisements on Google andwww.nytimes.com. How likely are visitors from those twosources to buy something? How do the dollar amountscompare? [answer leads to action: buy more ads from theplace that sends high-profit users]16.2 Step 2: Look at What's Easily AvailableEvery HTTP server program can be configured to log its actions.Typically the server will write two logs: (1) the "access log",containing one line corresponding to every user request, and (2) the"error log", containing complete in<strong>for</strong>mation about what went wrongduring those requests that resulted in program errors. A "file not64285
15.18 Time and MotionThe team should work together with the client to develop theontology. These discussions and the initial documentation shouldrequire 2 to 3 hours. Designing the metadata data model may be asimple copy/paste operation <strong>for</strong> teams building with Oracle, but inany case should require no more than an hour. Generating the DDLstatements and drop tables script should take about two hours ofwork by one programmer. Building out the system pages, Exercise 5through 10, should require 8 to 12 programmer-hours. This part canbe divided to an extent but it's probably best to limit the programmingto two individuals working together closely since the exercises buildupon one another. Finally, the writeups at the end should take one totwo hours total.#7241 really liked Article #2451" opens up interesting possibilities <strong>for</strong>personalization.Consider a corporate knowledge management system. At thebeginning the database is empty and there are only a few users.Scanning the titles of all contributed content would take only a fewminutes. After 5 years, however, the database contains 100,000documents and the 10,000 active users are contributing severalhundred new documents every day (keep in mind that a question oranswer in a discussion <strong>for</strong>um is a "document" <strong>for</strong> the purpose of thisdiscussion). If Jane User wants to see what her coworkers have beenup to in the last 24 hours, it might take her 30 minutes to scan thetitles of the new content. Jane User may well abandon an onlinelearning community that, when smaller, was very useful to her.Suppose now that the database contains 100 entries of the <strong>for</strong>m"Jane liked this article" and 100 entries of the <strong>for</strong>m "Jane did not likethis article". Be<strong>for</strong>e Jane has arrived at work, a batch job cancompare every new article in the system to the 100 articles that Janeliked and the 100 articles that Jane said she did not like. Thiscomparison can be done using most standard full-text searchsoftware, which will take two documents and score them <strong>for</strong> similaritybased on words used. Each new document is given a score of the<strong>for</strong>mavg(similarity(:new_doc,all_docs_marked_as_liked_by_user(:user_id)))-avg(similarity(:new_doc,all_docs_marked_as_disliked_by_user(:user_id)))The new documents are then presented to Jane ranked bydescending score. If you're an Intel stockholder you'll be pleased toconsider the computational implications of this personalizationscheme. Every new document must be compared to every documentpreviously marked by a user. Perhaps that is 200 comparisons. Ifthere are 10,000 users, this scoring operation must be repeated10,000 times. So that is 2,000,000 comparisons per day per newdocument in the system. Full-text comparisons generally are quiteslow as they rely on lookup up each word in a document to find itsoccurrence frequency in standard written English. A comparison oftwo documents can take 1/10th of CPU time. We're thus looking atabout 200,000 seconds of CPU time per new document added to thesystem, plus the insertion of 10,000 rows in the database, each rowcontaining the personalization score of that document <strong>for</strong> a particular28465
- Page 1 and 2:
SoftwareEngineering forInternetAppl
- Page 3 and 4:
Signature: ________________________
- Page 5 and 6:
end-users. We use every opportunity
- Page 7 and 8:
• availability of magnet content
- Page 9 and 10:
• we want to see if a student is
- Page 11 and 12:
you supply English-language queries
- Page 13 and 14: What to do during lecturesWe try to
- Page 15 and 16: The one-term cram courseWhen teachi
- Page 17 and 18: 332• spend a term learning how to
- Page 19 and 20: Once we've taught students how to b
- Page 21 and 22: has permission to perform each task
- Page 23 and 24: UDDIUnixcustomer's credit card. If
- Page 25 and 26: thousands of concurrent users. This
- Page 27 and 28: OraclePerlnamed XYZ" without the pr
- Page 29 and 30: LDAPLinuxbits per color, a vastly s
- Page 31 and 32: FilterFirewallFlat-fileGIF318functi
- Page 33 and 34: when there is an educational dimens
- Page 35 and 36: system. The authors of the core pro
- Page 37 and 38: Sign-OffsTry to schedule comprehens
- Page 39 and 40: scheduling goals that both you and
- Page 41 and 42: Client Tenure In Job (new, mid-term
- Page 43 and 44: ReferencesEngagement ManagementSQL*
- Page 45 and 46: Decision-makers often bring senior
- Page 47 and 48: presentation to a panel of outsider
- Page 49 and 50: 300always been written by programme
- Page 51 and 52: 17.3 Professionalism in the Softwar
- Page 53 and 54: Try to make sure that your audience
- Page 55 and 56: Chapter 17WriteupIf I am not for my
- Page 57 and 58: Suppose that an RDBMS failure were
- Page 59 and 60: analysis programs analyzing standar
- Page 61 and 62: at 9 hours 11 minutes 59 seconds pa
- Page 63: found" will result in an access log
- Page 67 and 68: select 227, 891, 'algorithm', curre
- Page 69 and 70: create table km_object_views (objec
- Page 71 and 72: • object-create• object-display
- Page 73 and 74: The trees chapter of SQL for Web Ne
- Page 75 and 76: );274-- ordering within a form, low
- Page 77 and 78: and start the high-level document f
- Page 79 and 80: Example Ontology 2: FlyingWe want a
- Page 81 and 82: systems. What would a knowledge man
- Page 83 and 84: spreadsheet". Other users can comme
- Page 85 and 86: Chapter 15Metadata (and Automatic C
- Page 87 and 88: {site url}{site description}en-usCo
- Page 89 and 90: drawing on the intermodule API that
- Page 91 and 92: At this point you have something of
- Page 93 and 94: • description• URL for a photo
- Page 95 and 96: Here's a raw SOAP request/response
- Page 97 and 98: Chapter 14Distributed Computing wit
- Page 99 and 100: conduct programmer job interviews h
- Page 101 and 102: Most admin pages can be excluded fr
- Page 103 and 104: content that should distinguish one
- Page 105 and 106: Chapter 13Planning ReduxA lot has c
- Page 107 and 108: the Internet-specific problem of no
- Page 109 and 110: wouldn't see these dirty tricks unl
- Page 111 and 112: 12.8 Exercise 4: Big BrotherGeneral
- Page 113 and 114: than one call to contains in the sa
- Page 115 and 116:
A third argument against the split
- Page 117 and 118:
way 1 1/16One might argue that this
- Page 119 and 120:
absquatulate 612bedizen 36, 9211cry
- Page 121 and 122:
What if the user typed multiple wor
- Page 123 and 124:
Chapter 12S E A R C HRecall from th
- Page 125 and 126:
long as it is much easier to remove
- Page 127 and 128:
features that are helpful? What fea
- Page 129 and 130:
made it in 1938)? Upon reflection,
- Page 131 and 132:
environment, we identify users by t
- Page 133 and 134:
those updates by no more than 1 min
- Page 135 and 136:
Balancer and mod_backhand, a load b
- Page 137 and 138:
translation had elapsed--the site w
- Page 139 and 140:
It seems reasonable to expect that
- Page 141 and 142:
11.1.5 Transport-Layer EncryptionWh
- Page 143 and 144:
such as ticket bookings would colla
- Page 145 and 146:
give their site a unique look and f
- Page 147 and 148:
It isn't challenging to throw hardw
- Page 149 and 150:
Chapter 11Scaling GracefullyLet's l
- Page 151 and 152:
10.15 Beyond VoiceXML: Conversation
- Page 153 and 154:
Consider that if you're authenticat
- Page 155 and 156:
In this example, we:194• ask the
- Page 157 and 158:
As in any XML document, every openi
- Page 159 and 160:
(http://www.voicegenie.com). These
- Page 161 and 162:
Chapter 10Voice (VoiceXML)questions
- Page 163 and 164:
9.15 MoreStandards information:•
- Page 165 and 166:
9.14 The FutureIn most countries th
- Page 167 and 168:
9.10 Exercise 7: Build a Pulse Page
- Page 169 and 170:
9.6 Keypad HyperlinksLet's look at
- Page 171 and 172:
text/xml,application/xml,applicatio
- Page 173 and 174:
Protocol (IP) routing, a standard H