13.07.2015 Views

Software Engineering for Internet Applications - Student Community

Software Engineering for Internet Applications - Student Community

Software Engineering for Internet Applications - Student Community

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Our current content_raw table contains some in<strong>for</strong>mation thatdepends on the whole key of content_id and version_number,e.g., the body and the language code. But much of the in<strong>for</strong>mationdepends only on the content_id portion of the key: author, creationtime, release time, zip code.When we need to store statements about two different kinds ofthings, it makes sense to create two different tables, i.e., to useSecond Formal Form:-- stuff about an item that doesn't change-- from version to versioncreate table content_raw (content_id integer primary key,content_type varchar(100) not null,refers_to references content_raw,creation_user not null references users,creation_date not null date,release_time date,expiration_timedate,mime_type varchar(100) not null,zip_code varchar(5));-- stuff about a version of an itemcreate table content_versions (version_id integer primary key,content_id not null references content_raw,version_date date not null,languagechar(2) references language_codes,one_line_summary varchar(200) not null,body blob,editorial_status varchar(30)check (editorial_status in('submitted','rejected','approved','expired')),-- audit the person who made the last change-- to editorial statuseditor_id references users,editorial_status_date date);How does one query into the versions table and find the latestversion? A first try might look something like the following:select *from content_versionswhere content_id = 5657and editorial_status = 'approved'112Oracle Text, via the "INSO filters" option, has the capability to index aremarkable variety of documents in a BLOB column. For example,the software can recognize a Microsoft Excel document, pull the textout and add it to the index. At the same time it is smart enough toknow when to ignore a document entirely, e.g., if the BLOB columnwere filled with a JPEG photograph.12.5 Exercise 1: Expected QueriesAsk your client what kinds of queries he or she expects to be mostcommon in your community. For example, in a site <strong>for</strong> academics itmight be very important to type in a person's name and get all of thepublications authored by that person. In a site <strong>for</strong> shoppers it mightbe essential to query <strong>for</strong> a brand name and get back product reviews.Only your client can say authoritatively.12.6 Exercise 2: Document Your DesignPlace a document at /doc/search in which you describe your team'splan <strong>for</strong> providing full-text search over the content on your site. If yourcontent management system has left you with a mixed bag of stuff inthe file system and stuff in the RDBMS, explain how you're going tosynchronize and unify these documents in one full-text index. Ifnightly maintenance scripts are required, document them here.Include your client's answers to Exercise 1 in this document.12.7 Exercise 3: Build the Basic Search ModuleBuild a basic search module that provides the following functions:• user query from the URI /search/, targeting/search/results• administrator ability to view statistics on the size andstructure of the corpus (how many documents of each type,total size of collection)• administrator ability to drop and rebuild the full-text index.Sadly this is necessary periodically with most tools and youdon't want the publisher to be <strong>for</strong>ced into obscure shellcommands. An ideal solution will be completelymaintainable from a Web browser.237

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!