13.07.2015 Views

Software Engineering for Internet Applications - Student Community

Software Engineering for Internet Applications - Student Community

Software Engineering for Internet Applications - Student Community

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

it, then copy the dev code over to the production directory andrestart.What's wrong with the two-server plan? Nothing if the developmentand testing teams are the same, in which case there is no possibilityof simultaneous development and testing. For a complex site,however, the publisher may wish to spend a week testing be<strong>for</strong>elaunching a revision. It isn't acceptable to idle authors and developerswhile a handful of testers bangs away at the development server.The addition of a staging server, rooted at /web/foobar-staging/(Server 3) allows development to proceed while testers are preparing<strong>for</strong> the public launch of a new version.Here's how the three servers are used:1. developers work continuously in /web/foobar-dev/2. when the publisher is mostly happy with the developmentsite, a named version or branch is created and installed at/web/foobar-staging/3. the testers bang away at the /web/foobar-staging/ server,checking fixes back into the version control repository butonly into the staging branch4. when the testers and publishers sign off on the stagingserver's per<strong>for</strong>mance, the site is released to /web/foobar/(production)5. any fixes made to the staging branch of the code that havenot already been fixed by the development team are mergedback into the development branch in the version controlrepositoryItem 2: Two or Three RDBMS Users/TablespacesSuppose that the publisher has a working production site runningversion 1.0 of the software. One could connect the developmentserver rooted at /web/foobar-dev/ to the production database.After all, the raison d'être of the RDBMS is concurrency control. It willbe happy to handle eight simultaneous connections from aproduction Web server plus two or three from a development server.The fly in this ointment is that one of the developers might get sloppyand write a program that sends drop table users rather thandrop table users_experimental_extra_table to the database.Or, less dramatically, a junior developer might leave out a WHEREclause in an SQL statement and inadvertently request a result set of10^9 rows, thus slowing down the production site.Inserting a new document into the collection will be slow. We'll haveto go through the document, word by word, and update as many rowsin the index as there are distinct words in the document. But thatextra work at insertion time pays off in a reduction in query time fromO[N] to O[1].Given a data structure of the preceding <strong>for</strong>m, we can quickly find alldocuments containing the word "running". We can also quickly find alldocuments containing the word "shoes". We can intersect theseresult sets quickly, giving us the documents that contain both"running" and "shoes". With some fancier indexing data structures wecan restrict our search to documents that contain the contiguousphrase "running shoes" as opposed to documents where those wordsappear separately. But suppose that there are 1000 documents in thecollection containing these two words. Which are the most relevant tothe user's query of "running shoes"?We need a new data structure: the word-frequency histogram. Thiswill tell us which words occur in a document and how frequently theyoccur in a way that is easily adjusted <strong>for</strong> the total length of adocument.Here's a word-frequency histogram <strong>for</strong> the first sentence of AnnaKarenina:Word Count Frequencyall 1 1/16another 1 1/16but 1 1/16each 1 1/16families 1 1/16family 1 1/16happy 1 1/16in 1 1/16is 1 1/16its 1 1/16one 1 1/16own 1 1/16resemble 1 1/16unhappy 2 2/16118231

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!