13.07.2015 Views

Software Engineering for Internet Applications - Student Community

Software Engineering for Internet Applications - Student Community

Software Engineering for Internet Applications - Student Community

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

absquatulate 612bedizen 36, 9211cryptogenic 9dactylioglyph 7214exheredate 57, 812, 4010feuilleton 87, 349, 1203genetotrophic 5000hartebeest 710inspissate 549, 21, 3987...samoyed 17, 91, 1000, 3492sesquipedalian 723the 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,...uberous 6, 800velutinous 45, 2307widdershins 7300xenial 3611ypsili<strong>for</strong>m 5607zibeline 4782If we build this as a hash table, we have O[1] access to a row in thetable. If we merely keep the rows in sorted order, we have O[log W]access to any row in the table, where W is the number of words inour vocabulary. Per<strong>for</strong>mance does not vary with the number ofdocuments in the collection... or does it? Just about every Englishdocument will contain the word "the" and there<strong>for</strong>e simply returningthe value of the document_ids column <strong>for</strong> the word "the" will takeO[N] time, where N is the number of documents in the corpus. Thisrow isn't useful anyway because it isn't selective, i.e., we could getthe same in<strong>for</strong>mation almost as fast with a sequential scan of thedocuments table, collecting all the document IDs. While indexing adocument, a full-text search system will refer to a list of stopwords,words that are too common to be worth indexing. For standardEnglish, the stopword list includes such words as "a", "and", "as","at", "<strong>for</strong>", "or", "the", etc.So it would seem that this publisher will need at least one newdatabase. Here are the steps:1. create a new database user and tablespace; if this is on aseparate physical computer from your production RDBMSserver it will protect your production server's per<strong>for</strong>mancefrom inadvertent denial-of-service attacks by sloppydevelopment SQL statements2. export the production database into a file system file, whichis a good periodic practice in any case as it will verify theintegrity of the database3. import the database export into the new developmentdatabase4. every time that a developer alters a table, adds a table, orpopulates a new table, record the operation in a"patches.sql" file5. when ready to move code from staging to production, hastilyapply all the data model modifications from patches.sql tothe production RDBMSShould there be three databases, i.e., one <strong>for</strong> dev, one <strong>for</strong> staging,and one <strong>for</strong> production? Not necessarily. Unless one expects radicaldata model evolution it may be acceptable to use the same database<strong>for</strong> development and staging. Keep in mind that adding a column to arelational database table seldom breaks old queries. This was one ofthe objectives set <strong>for</strong>th by E.F. Codd in 1970 in "A Relational Modelof Data <strong>for</strong> Large Shared Data Banks"(http://www.acm.org/classics/nov95/toc.html) and certainly modernimplementations of the relational model have lived up to Codd'shopes in this respect.Item 3: One Version Control RepositoryThe function of the version control repository is to• remember what all the previous checked-in versions of a filecontained• show the difference between what's in a checked-out treeand what's in the repository• help merge changes made simultaneously by multipleauthors who might have been unaware of each other's work• group a snapshot of currently checked-in versions of files as"Release 2.1" or "JuneIssue"230119

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!