create table content_raw (content_id integer primary key,content_type varchar(100) not null,refers_to references content_raw,creation_user not null references users,creation_date not null date,release_time date,expiration_time date,-- some of our content is geographically specificzip_codevarchar(5),-- a lot of our readers will appreciate Spanish-- versionslanguagechar(2) references language_codes,mime_type varchar(100) not null,one_line_summary varchar(200) not null,-- let's use BLOB in case this is a Microsoft Word doc-- or JPEG a BLOB can also hold HTML or plain textbody blob,editorial_status varchar(30)check (editorial_status in('submitted','rejected','approved','expired')));If this table is to contain 7 versions of an article with a Content ID of5657 that will violate the primary key constraint on the content_idcolumn. What if we remove the primary key constraint? In Oracle thisprevents us from establishing referential integrity constraints pointingto this ID. With no integrity constraints, we will be running the risk, <strong>for</strong>example, that our database will contain comments on content itemsthat have been deleted. With multiple rows <strong>for</strong> each content item ourpointers become ambiguous. The statement "User 739 has readArticle 5657" points from a specific row in the users table into a setof rows in the content_raw. Should we try to be more specific? Dowe want a comment on an article to refer to a specific version of thatarticle? Do we want to know that a reader has read a specific versionof an article? Do we want to know that an editor has approved aspecific version of an article? It depends. For some purposes weprobably do want to point to a version, e.g., <strong>for</strong> approval, and at othertimes we want to point to the article in the abstract. If we add aversion_number column, this becomes relatively straight<strong>for</strong>ward.create table content_raw (-- the combination of these two is the keycontent_id integer,version_number integer,...primary key (content_id, version_number)If you've been requiring registration to view discussions, <strong>for</strong> example,those discussions won't be indexed by Google unless your softwareis smart enough to recognize that it is Google behind the request andmake an exception. How to recognize Google? Here's a one-linesnippet from the photo.net access log (newlines inserted <strong>for</strong>readability):216.239.46.48 - - [19/Mar/2002:03:36:56 -0500]"GET /minolta/dimage-7/ HTTP/1.0"200 18881"" "Googlebot/2.1(+http://www.googlebot.com/bot.html)"Notice the user-agent header at the end: Googlebot/2.1(+http://www.googlebot.com/bot.html). Because some searchengines archive what they index you would not want to provideregistration-free access to content that is truly private to members. Intheory a placed in the HEAD of your HTML documents would prevent searchengines from archiving the page but robots are not guaranteed tofollow such directives.Some search engines allow you to provide indexing hints and hints<strong>for</strong> presentation once a user is looking at a search results page. Forexample, in the table of contents page <strong>for</strong> this book, we have thefollowing META tags in the HEAD:The "keywords" tag adds some words that are relevant to thedocument but not present in the visible text. This would helpsomeone who decided to search <strong>for</strong> "MIT 6.171 textbook", <strong>for</strong>example. The "description" tag can be used by a search engine whensummarizing a page. If it isn't present a search engine may show thefirst 20 words on the page or follow some heuristics to build areasonable summary. These tags have been routinely abused. Apublisher might add popular search terms such as "sex" to a site thatis unrelated to those terms, in hopes of capturing more readers. Acompany might add the names of its competitors as keywords. Users110239
12.8 Exercise 4: Big BrotherGenerally users prefer to browse rather than search. If users areresorting to searches in order to get standard answers or per<strong>for</strong>mcommon tasks, there may be something wrong with a site'snavigation or in<strong>for</strong>mation architecture. If users are per<strong>for</strong>mingsearches and getting 0 results back from your full text search facility,either your index or the site's content needs augmentation.Record user search strings in an RDBMS table and let admins seewhat the popular search terms are (by the day, week, or month).Make sure to highlight any searches that resulted in the user seeinga page "No documents matched your query". Ask yourself whether itwould be ethical to implement a facility whereby the siteadministrators could view a report of search strings and the userswho typed them in.Update your /doc/search file to reflect the addition of this facility.12.9 Exercise 5: LinkageFind logical places among your community's pages to link to thesearch facility. For example, on many sites it will make sense to havea quick search box in the upper-right corner of every page served.On most sites it makes sense to link back to search from the searchresults page with a "search again" box filled in by default with theoriginal query.Make sure that your main documentation page links to the docs <strong>for</strong>this new module.12.10 Working with the Public Search EnginesIf your online community is on the public <strong>Internet</strong> you probably wouldlike to see your content indexed by public search engines such aswww.google.com. First, Google has to know about your server. Thishappens either when someone already in the Google index links toyour site or when you manually add your URL from a <strong>for</strong>m off thegoogle.com home page. Second, Google has to be able to read thetext on your server. At least as of 2003 none of the public searchengines implemented optical character recognition (OCR). Thismeans that text embedded in a GIF, Flash animation, or a Javaapplet won't be indexed. It might be readable by a human user withperfect eyesight but it won't be readable by the computer programsthat crawl the Web to build databases <strong>for</strong> public search engines.Third, Google has to be able to get into all the pages on your server.238);Retrieving in<strong>for</strong>mation <strong>for</strong> a specific version is easy. Retrievingin<strong>for</strong>mation that is the same across multiple versions of a contentitem becomes clumsy and requires a GROUP BY, since we want tocollapse in<strong>for</strong>mation from several rows into a one-row report:-- note the use of MAX on VARCHAR column;-- this works just fineselect content_id, max(zip_code)from content_rawwhere content_id = 5657group by content_idWe're not really interested in the largest zip code <strong>for</strong> a particularcontent item version. In fact, unless there has been some kind ofmistake in our application code, we assume that all zip codes <strong>for</strong>multiple versions of the same content item are the same. However,GROUP BY is a mechanism <strong>for</strong> collapsing in<strong>for</strong>mation from multiplerows. The SELECT list can contain column names only <strong>for</strong> thosecolumns that are being GROUPed BY. Anything else in the SELECTlist must be the result of aggregating the multiple values <strong>for</strong> columnsthat aren't GROUPed. The choices with most RDBMSes are prettylimited: MAX, MIN, AVERAGE, SUM. There is no "pick any" function.So we use MAX.Updates are similarly problematic. The U.S. Postal Serviceperiodically redraws the zip code maps. Updating one piece ofin<strong>for</strong>mation, e.g., "20016" to "20816", will touch more than one rowper content item.This data model is in First Normal Form. Every value is available atthe intersection of a table name, column name, and key (thecomposite primary key of content_id and version_number).However, it is not in Second Normal Form, which is why our queriesand updates appear strange.In Second Normal Form, all columns are functionally dependent onthe whole key. Less <strong>for</strong>mally, a Second Normal Form table is onethat is in First Normal Form with a key that determines all non-keycolumn values. Even less <strong>for</strong>mally, a Second Normal Form tablecontains statements about only one kind of thing.111
- Page 1 and 2:
SoftwareEngineering forInternetAppl
- Page 3 and 4:
Signature: ________________________
- Page 5 and 6:
end-users. We use every opportunity
- Page 7 and 8:
• availability of magnet content
- Page 9 and 10:
• we want to see if a student is
- Page 11 and 12:
you supply English-language queries
- Page 13 and 14:
What to do during lecturesWe try to
- Page 15 and 16:
The one-term cram courseWhen teachi
- Page 17 and 18:
332• spend a term learning how to
- Page 19 and 20:
Once we've taught students how to b
- Page 21 and 22:
has permission to perform each task
- Page 23 and 24:
UDDIUnixcustomer's credit card. If
- Page 25 and 26:
thousands of concurrent users. This
- Page 27 and 28:
OraclePerlnamed XYZ" without the pr
- Page 29 and 30:
LDAPLinuxbits per color, a vastly s
- Page 31 and 32:
FilterFirewallFlat-fileGIF318functi
- Page 33 and 34:
when there is an educational dimens
- Page 35 and 36:
system. The authors of the core pro
- Page 37 and 38:
Sign-OffsTry to schedule comprehens
- Page 39 and 40:
scheduling goals that both you and
- Page 41 and 42:
Client Tenure In Job (new, mid-term
- Page 43 and 44:
ReferencesEngagement ManagementSQL*
- Page 45 and 46:
Decision-makers often bring senior
- Page 47 and 48:
presentation to a panel of outsider
- Page 49 and 50:
300always been written by programme
- Page 51 and 52:
17.3 Professionalism in the Softwar
- Page 53 and 54:
Try to make sure that your audience
- Page 55 and 56:
Chapter 17WriteupIf I am not for my
- Page 57 and 58:
Suppose that an RDBMS failure were
- Page 59 and 60: analysis programs analyzing standar
- Page 61 and 62: at 9 hours 11 minutes 59 seconds pa
- Page 63 and 64: found" will result in an access log
- Page 65 and 66: 15.18 Time and MotionThe team shoul
- Page 67 and 68: select 227, 891, 'algorithm', curre
- Page 69 and 70: create table km_object_views (objec
- Page 71 and 72: • object-create• object-display
- Page 73 and 74: The trees chapter of SQL for Web Ne
- Page 75 and 76: );274-- ordering within a form, low
- Page 77 and 78: and start the high-level document f
- Page 79 and 80: Example Ontology 2: FlyingWe want a
- Page 81 and 82: systems. What would a knowledge man
- Page 83 and 84: spreadsheet". Other users can comme
- Page 85 and 86: Chapter 15Metadata (and Automatic C
- Page 87 and 88: {site url}{site description}en-usCo
- Page 89 and 90: drawing on the intermodule API that
- Page 91 and 92: At this point you have something of
- Page 93 and 94: • description• URL for a photo
- Page 95 and 96: Here's a raw SOAP request/response
- Page 97 and 98: Chapter 14Distributed Computing wit
- Page 99 and 100: conduct programmer job interviews h
- Page 101 and 102: Most admin pages can be excluded fr
- Page 103 and 104: content that should distinguish one
- Page 105 and 106: Chapter 13Planning ReduxA lot has c
- Page 107 and 108: the Internet-specific problem of no
- Page 109: wouldn't see these dirty tricks unl
- Page 113 and 114: than one call to contains in the sa
- Page 115 and 116: A third argument against the split
- Page 117 and 118: way 1 1/16One might argue that this
- Page 119 and 120: absquatulate 612bedizen 36, 9211cry
- Page 121 and 122: What if the user typed multiple wor
- Page 123 and 124: Chapter 12S E A R C HRecall from th
- Page 125 and 126: long as it is much easier to remove
- Page 127 and 128: features that are helpful? What fea
- Page 129 and 130: made it in 1938)? Upon reflection,
- Page 131 and 132: environment, we identify users by t
- Page 133 and 134: those updates by no more than 1 min
- Page 135 and 136: Balancer and mod_backhand, a load b
- Page 137 and 138: translation had elapsed--the site w
- Page 139 and 140: It seems reasonable to expect that
- Page 141 and 142: 11.1.5 Transport-Layer EncryptionWh
- Page 143 and 144: such as ticket bookings would colla
- Page 145 and 146: give their site a unique look and f
- Page 147 and 148: It isn't challenging to throw hardw
- Page 149 and 150: Chapter 11Scaling GracefullyLet's l
- Page 151 and 152: 10.15 Beyond VoiceXML: Conversation
- Page 153 and 154: Consider that if you're authenticat
- Page 155 and 156: In this example, we:194• ask the
- Page 157 and 158: As in any XML document, every openi
- Page 159 and 160: (http://www.voicegenie.com). These
- Page 161 and 162:
Chapter 10Voice (VoiceXML)questions
- Page 163 and 164:
9.15 MoreStandards information:•
- Page 165 and 166:
9.14 The FutureIn most countries th
- Page 167 and 168:
9.10 Exercise 7: Build a Pulse Page
- Page 169 and 170:
9.6 Keypad HyperlinksLet's look at
- Page 171 and 172:
text/xml,application/xml,applicatio
- Page 173 and 174:
Protocol (IP) routing, a standard H