Software Engineering for Internet Applications - Student Community
Software Engineering for Internet Applications - Student Community
Software Engineering for Internet Applications - Student Community
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Our current content_raw table contains some in<strong>for</strong>mation thatdepends on the whole key of content_id and version_number,e.g., the body and the language code. But much of the in<strong>for</strong>mationdepends only on the content_id portion of the key: author, creationtime, release time, zip code.When we need to store statements about two different kinds ofthings, it makes sense to create two different tables, i.e., to useSecond Formal Form:-- stuff about an item that doesn't change-- from version to versioncreate table content_raw (content_id integer primary key,content_type varchar(100) not null,refers_to references content_raw,creation_user not null references users,creation_date not null date,release_time date,expiration_timedate,mime_type varchar(100) not null,zip_code varchar(5));-- stuff about a version of an itemcreate table content_versions (version_id integer primary key,content_id not null references content_raw,version_date date not null,languagechar(2) references language_codes,one_line_summary varchar(200) not null,body blob,editorial_status varchar(30)check (editorial_status in('submitted','rejected','approved','expired')),-- audit the person who made the last change-- to editorial statuseditor_id references users,editorial_status_date date);How does one query into the versions table and find the latestversion? A first try might look something like the following:select *from content_versionswhere content_id = 5657and editorial_status = 'approved'112Oracle Text, via the "INSO filters" option, has the capability to index aremarkable variety of documents in a BLOB column. For example,the software can recognize a Microsoft Excel document, pull the textout and add it to the index. At the same time it is smart enough toknow when to ignore a document entirely, e.g., if the BLOB columnwere filled with a JPEG photograph.12.5 Exercise 1: Expected QueriesAsk your client what kinds of queries he or she expects to be mostcommon in your community. For example, in a site <strong>for</strong> academics itmight be very important to type in a person's name and get all of thepublications authored by that person. In a site <strong>for</strong> shoppers it mightbe essential to query <strong>for</strong> a brand name and get back product reviews.Only your client can say authoritatively.12.6 Exercise 2: Document Your DesignPlace a document at /doc/search in which you describe your team'splan <strong>for</strong> providing full-text search over the content on your site. If yourcontent management system has left you with a mixed bag of stuff inthe file system and stuff in the RDBMS, explain how you're going tosynchronize and unify these documents in one full-text index. Ifnightly maintenance scripts are required, document them here.Include your client's answers to Exercise 1 in this document.12.7 Exercise 3: Build the Basic Search ModuleBuild a basic search module that provides the following functions:• user query from the URI /search/, targeting/search/results• administrator ability to view statistics on the size andstructure of the corpus (how many documents of each type,total size of collection)• administrator ability to drop and rebuild the full-text index.Sadly this is necessary periodically with most tools and youdon't want the publisher to be <strong>for</strong>ced into obscure shellcommands. An ideal solution will be completelymaintainable from a Web browser.237