13.07.2015 Views

Software Engineering for Internet Applications - Student Community

Software Engineering for Internet Applications - Student Community

Software Engineering for Internet Applications - Student Community

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

12.8 Exercise 4: Big BrotherGenerally users prefer to browse rather than search. If users areresorting to searches in order to get standard answers or per<strong>for</strong>mcommon tasks, there may be something wrong with a site'snavigation or in<strong>for</strong>mation architecture. If users are per<strong>for</strong>mingsearches and getting 0 results back from your full text search facility,either your index or the site's content needs augmentation.Record user search strings in an RDBMS table and let admins seewhat the popular search terms are (by the day, week, or month).Make sure to highlight any searches that resulted in the user seeinga page "No documents matched your query". Ask yourself whether itwould be ethical to implement a facility whereby the siteadministrators could view a report of search strings and the userswho typed them in.Update your /doc/search file to reflect the addition of this facility.12.9 Exercise 5: LinkageFind logical places among your community's pages to link to thesearch facility. For example, on many sites it will make sense to havea quick search box in the upper-right corner of every page served.On most sites it makes sense to link back to search from the searchresults page with a "search again" box filled in by default with theoriginal query.Make sure that your main documentation page links to the docs <strong>for</strong>this new module.12.10 Working with the Public Search EnginesIf your online community is on the public <strong>Internet</strong> you probably wouldlike to see your content indexed by public search engines such aswww.google.com. First, Google has to know about your server. Thishappens either when someone already in the Google index links toyour site or when you manually add your URL from a <strong>for</strong>m off thegoogle.com home page. Second, Google has to be able to read thetext on your server. At least as of 2003 none of the public searchengines implemented optical character recognition (OCR). Thismeans that text embedded in a GIF, Flash animation, or a Javaapplet won't be indexed. It might be readable by a human user withperfect eyesight but it won't be readable by the computer programsthat crawl the Web to build databases <strong>for</strong> public search engines.Third, Google has to be able to get into all the pages on your server.238);Retrieving in<strong>for</strong>mation <strong>for</strong> a specific version is easy. Retrievingin<strong>for</strong>mation that is the same across multiple versions of a contentitem becomes clumsy and requires a GROUP BY, since we want tocollapse in<strong>for</strong>mation from several rows into a one-row report:-- note the use of MAX on VARCHAR column;-- this works just fineselect content_id, max(zip_code)from content_rawwhere content_id = 5657group by content_idWe're not really interested in the largest zip code <strong>for</strong> a particularcontent item version. In fact, unless there has been some kind ofmistake in our application code, we assume that all zip codes <strong>for</strong>multiple versions of the same content item are the same. However,GROUP BY is a mechanism <strong>for</strong> collapsing in<strong>for</strong>mation from multiplerows. The SELECT list can contain column names only <strong>for</strong> thosecolumns that are being GROUPed BY. Anything else in the SELECTlist must be the result of aggregating the multiple values <strong>for</strong> columnsthat aren't GROUPed. The choices with most RDBMSes are prettylimited: MAX, MIN, AVERAGE, SUM. There is no "pick any" function.So we use MAX.Updates are similarly problematic. The U.S. Postal Serviceperiodically redraws the zip code maps. Updating one piece ofin<strong>for</strong>mation, e.g., "20016" to "20816", will touch more than one rowper content item.This data model is in First Normal Form. Every value is available atthe intersection of a table name, column name, and key (thecomposite primary key of content_id and version_number).However, it is not in Second Normal Form, which is why our queriesand updates appear strange.In Second Normal Form, all columns are functionally dependent onthe whole key. Less <strong>for</strong>mally, a Second Normal Form table is onethat is in First Normal Form with a key that determines all non-keycolumn values. Even less <strong>for</strong>mally, a Second Normal Form tablecontains statements about only one kind of thing.111

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!