Our current content_raw table contains some in<strong>for</strong>mation thatdepends on the whole key of content_id and version_number,e.g., the body and the language code. But much of the in<strong>for</strong>mationdepends only on the content_id portion of the key: author, creationtime, release time, zip code.When we need to store statements about two different kinds ofthings, it makes sense to create two different tables, i.e., to useSecond Formal Form:-- stuff about an item that doesn't change-- from version to versioncreate table content_raw (content_id integer primary key,content_type varchar(100) not null,refers_to references content_raw,creation_user not null references users,creation_date not null date,release_time date,expiration_timedate,mime_type varchar(100) not null,zip_code varchar(5));-- stuff about a version of an itemcreate table content_versions (version_id integer primary key,content_id not null references content_raw,version_date date not null,languagechar(2) references language_codes,one_line_summary varchar(200) not null,body blob,editorial_status varchar(30)check (editorial_status in('submitted','rejected','approved','expired')),-- audit the person who made the last change-- to editorial statuseditor_id references users,editorial_status_date date);How does one query into the versions table and find the latestversion? A first try might look something like the following:select *from content_versionswhere content_id = 5657and editorial_status = 'approved'112Oracle Text, via the "INSO filters" option, has the capability to index aremarkable variety of documents in a BLOB column. For example,the software can recognize a Microsoft Excel document, pull the textout and add it to the index. At the same time it is smart enough toknow when to ignore a document entirely, e.g., if the BLOB columnwere filled with a JPEG photograph.12.5 Exercise 1: Expected QueriesAsk your client what kinds of queries he or she expects to be mostcommon in your community. For example, in a site <strong>for</strong> academics itmight be very important to type in a person's name and get all of thepublications authored by that person. In a site <strong>for</strong> shoppers it mightbe essential to query <strong>for</strong> a brand name and get back product reviews.Only your client can say authoritatively.12.6 Exercise 2: Document Your DesignPlace a document at /doc/search in which you describe your team'splan <strong>for</strong> providing full-text search over the content on your site. If yourcontent management system has left you with a mixed bag of stuff inthe file system and stuff in the RDBMS, explain how you're going tosynchronize and unify these documents in one full-text index. Ifnightly maintenance scripts are required, document them here.Include your client's answers to Exercise 1 in this document.12.7 Exercise 3: Build the Basic Search ModuleBuild a basic search module that provides the following functions:• user query from the URI /search/, targeting/search/results• administrator ability to view statistics on the size andstructure of the corpus (how many documents of each type,total size of collection)• administrator ability to drop and rebuild the full-text index.Sadly this is necessary periodically with most tools and youdon't want the publisher to be <strong>for</strong>ced into obscure shellcommands. An ideal solution will be completelymaintainable from a Web browser.237
than one call to contains in the same query. Thus the lastargument of contains is an integer identifying the query, in this case"1". It is possible to get a relevance score out in the select list or in anORDER BY clause with the function score and an argumentidentifying from which contains call the score should be pulled.Oracle Text is one of the more difficult and complex Oracle RDBMSproducts to use. For example, if you want to be able to search <strong>for</strong> aphrase that occurs in either the one_line_summary or body andcombine the relevance score, you need to build a multi-column index:ctx_ddl.create_preference('content_multi','MULTI_COLUMN_DATASTORE');ctx_ddl.set_attribute('content_multi', 'COLUMNS','one_line_summary, body');create index content_texton content(modified_date)indextype is ctxsys.contextparameters('datastore content_multi');Notice that the index itself is built on the column modified_date,which is not itself indexed. The call to ctx_ddl.set_attribute inwhich the COLUMNS attribute is set is what determines whichcolumns get indexed.For an example of a system that tackles the challenge of indexing textfrom disparate Oracle tables, seehttp://philip.greenspun.com/seia/examples-search/site-wide-searchOracle Text also has the property that its default search mode isexact phrase matching. A user who types "zippy pinhead" into asearch engine will expect to find documents that contain the phrase"Zippy the Pinhead". This won't happy if your script passes the rawuser query right through to the Contains operator. More problematicis what happens when a user types a query string that containscharacters that Oracle Text treats specially. This can result in anerror being raised by the SQL query and a "Server Error 500"returned to the user if you don't catch the error in your proceduralscript. It would be nice if Oracle Text had a built-in procedure called"ProcessRawQueryFromWebForm" or something. But it doesn't, atleast not until the promised 9.2 version. The next best thing is aprocedure called pavtranslate, available fromhttp://technet.oracle.com/sample_code/products/text/htdocs/query_syntax_translators/query_syntax_translators.html.236and version_date = (select max(version_date)from content_versionswhere content_id = 5657and editorial_status ='approved')Is this guaranteed to return only one row? No! There is no uniqueconstraint on content_id, version_date. In theory two editorscould or authors could submit new versions of an item within thesame second. Remember that the date datatype in Oracle is preciseonly to within 1 second. Even more likely is that an editor doing arevision might click on an editing <strong>for</strong>m submit button twice with themouse or perhaps use the Reload command impatiently. Here's aslight improvement:select *from content_versionswhere content_id = 5657and editorial_status = 'approved'and version_id = (select max(version_id)from content_versionswhere content_id = 5657and editorial_status ='approved')The version_id column is constrained unique but we're relying onunstated knowledge of our application code, i.e., that version_id willbe larger <strong>for</strong> later versions.Some RDBMS implementations have extended the SQL language sothat you can ask <strong>for</strong> the first row returned by a query. A brief look atthe Oracle manual would lead one to tryselect *from content_versionswhere content_id = 5657and editorial_status = 'approved'and rownum = 1order by version_date descbut a deeper reading of the manual would reveal that the rownumpseudocolumn is set be<strong>for</strong>e the ORDER BY clause is processed. Anaccepted way to do this in one query is the nested SELECT:select *113
- Page 1 and 2:
SoftwareEngineering forInternetAppl
- Page 3 and 4:
Signature: ________________________
- Page 5 and 6:
end-users. We use every opportunity
- Page 7 and 8:
• availability of magnet content
- Page 9 and 10:
• we want to see if a student is
- Page 11 and 12:
you supply English-language queries
- Page 13 and 14:
What to do during lecturesWe try to
- Page 15 and 16:
The one-term cram courseWhen teachi
- Page 17 and 18:
332• spend a term learning how to
- Page 19 and 20:
Once we've taught students how to b
- Page 21 and 22:
has permission to perform each task
- Page 23 and 24:
UDDIUnixcustomer's credit card. If
- Page 25 and 26:
thousands of concurrent users. This
- Page 27 and 28:
OraclePerlnamed XYZ" without the pr
- Page 29 and 30:
LDAPLinuxbits per color, a vastly s
- Page 31 and 32:
FilterFirewallFlat-fileGIF318functi
- Page 33 and 34:
when there is an educational dimens
- Page 35 and 36:
system. The authors of the core pro
- Page 37 and 38:
Sign-OffsTry to schedule comprehens
- Page 39 and 40:
scheduling goals that both you and
- Page 41 and 42:
Client Tenure In Job (new, mid-term
- Page 43 and 44:
ReferencesEngagement ManagementSQL*
- Page 45 and 46:
Decision-makers often bring senior
- Page 47 and 48:
presentation to a panel of outsider
- Page 49 and 50:
300always been written by programme
- Page 51 and 52:
17.3 Professionalism in the Softwar
- Page 53 and 54:
Try to make sure that your audience
- Page 55 and 56:
Chapter 17WriteupIf I am not for my
- Page 57 and 58:
Suppose that an RDBMS failure were
- Page 59 and 60:
analysis programs analyzing standar
- Page 61 and 62: at 9 hours 11 minutes 59 seconds pa
- Page 63 and 64: found" will result in an access log
- Page 65 and 66: 15.18 Time and MotionThe team shoul
- Page 67 and 68: select 227, 891, 'algorithm', curre
- Page 69 and 70: create table km_object_views (objec
- Page 71 and 72: • object-create• object-display
- Page 73 and 74: The trees chapter of SQL for Web Ne
- Page 75 and 76: );274-- ordering within a form, low
- Page 77 and 78: and start the high-level document f
- Page 79 and 80: Example Ontology 2: FlyingWe want a
- Page 81 and 82: systems. What would a knowledge man
- Page 83 and 84: spreadsheet". Other users can comme
- Page 85 and 86: Chapter 15Metadata (and Automatic C
- Page 87 and 88: {site url}{site description}en-usCo
- Page 89 and 90: drawing on the intermodule API that
- Page 91 and 92: At this point you have something of
- Page 93 and 94: • description• URL for a photo
- Page 95 and 96: Here's a raw SOAP request/response
- Page 97 and 98: Chapter 14Distributed Computing wit
- Page 99 and 100: conduct programmer job interviews h
- Page 101 and 102: Most admin pages can be excluded fr
- Page 103 and 104: content that should distinguish one
- Page 105 and 106: Chapter 13Planning ReduxA lot has c
- Page 107 and 108: the Internet-specific problem of no
- Page 109 and 110: wouldn't see these dirty tricks unl
- Page 111: 12.8 Exercise 4: Big BrotherGeneral
- Page 115 and 116: A third argument against the split
- Page 117 and 118: way 1 1/16One might argue that this
- Page 119 and 120: absquatulate 612bedizen 36, 9211cry
- Page 121 and 122: What if the user typed multiple wor
- Page 123 and 124: Chapter 12S E A R C HRecall from th
- Page 125 and 126: long as it is much easier to remove
- Page 127 and 128: features that are helpful? What fea
- Page 129 and 130: made it in 1938)? Upon reflection,
- Page 131 and 132: environment, we identify users by t
- Page 133 and 134: those updates by no more than 1 min
- Page 135 and 136: Balancer and mod_backhand, a load b
- Page 137 and 138: translation had elapsed--the site w
- Page 139 and 140: It seems reasonable to expect that
- Page 141 and 142: 11.1.5 Transport-Layer EncryptionWh
- Page 143 and 144: such as ticket bookings would colla
- Page 145 and 146: give their site a unique look and f
- Page 147 and 148: It isn't challenging to throw hardw
- Page 149 and 150: Chapter 11Scaling GracefullyLet's l
- Page 151 and 152: 10.15 Beyond VoiceXML: Conversation
- Page 153 and 154: Consider that if you're authenticat
- Page 155 and 156: In this example, we:194• ask the
- Page 157 and 158: As in any XML document, every openi
- Page 159 and 160: (http://www.voicegenie.com). These
- Page 161 and 162: Chapter 10Voice (VoiceXML)questions
- Page 163 and 164:
9.15 MoreStandards information:•
- Page 165 and 166:
9.14 The FutureIn most countries th
- Page 167 and 168:
9.10 Exercise 7: Build a Pulse Page
- Page 169 and 170:
9.6 Keypad HyperlinksLet's look at
- Page 171 and 172:
text/xml,application/xml,applicatio
- Page 173 and 174:
Protocol (IP) routing, a standard H