13.07.2015 Views

Software Engineering for Internet Applications - Student Community

Software Engineering for Internet Applications - Student Community

Software Engineering for Internet Applications - Student Community

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

how data are spread among physical disk drives. The databasetheoretician would note that our data model is in Second NormalForm but not in Third Normal Form. In a table that is part of a ThirdNormal Form data model, all columns are directly dependent on thewhole key. The column current_version_p is not dependent on thetable key but rather on two other non-key columns(editorial_status and version_date). SQL programmersrefer to this kind of per<strong>for</strong>mance-enhancing storage of derivable dataas "denormalization".If you want to serve 10 million requests per day directly from anRDBMS running on a server of modest capacity, you may need tobreak some rules. However, the most maintainable production datamodels usually result from beginning with Third Normal Form andadding a handful of modest and judicious denormalizations that aredocumented and justified.Note that any data model in Third Normal Form is also in Second NormalForm. A data model in Second Normal Form is in First Normal Form.6.14 Version Control (<strong>for</strong> Computer Programs)Note that a solution to the version control problem <strong>for</strong> site content(stuff in the database) still leaves you, as an engineer, with theproblem of version control <strong>for</strong> the computer programs that implementthe site. These are most likely in the operating system file systemand edited by a handful of professional software developers. Duringthis class you may decide that it is not worth the ef<strong>for</strong>t to set up anduse version control, in which case your de facto version controlsystem becomes backup tapes so make sure that you've got dailybackups. However, in the long run you need to learn aboutapproaches to version control <strong>for</strong> <strong>Internet</strong> application development.Throughout this section, keep in mind that a project with a very clearpublishing objective, specs that never change, and one very smartdeveloper, does not need version control. A project with evolvingobjectives, changing specifications, and multiple contributors needsversion control.Classical Solution: one development area per developerClassically version control is used by C developers with each Cprogrammer working from his or her own directory. This makes sensebecause there is no persistence in the C world. Code is compiled. Abinary runs that builds data structures in RAM. When the programterminates it doesn't leave anything behind. The entire "tree" of116A pragmatic approach would seem to start by keeping all thedocuments in the RDBMS: articles, user comments, discussion <strong>for</strong>umpostings, etc. Either once per night or every time a new documentwas added, update a full-text search system's collection. Pages thatare part of the standard user experience and workflow operate fromthe RDBMS. The search box at the upper right corner of every page,however, queries against the full-text search system. Let's call this asplit-system design.**** insert figure *****Figure 1: A split-system approach to providing full-text search. Theapplication's content is stored in a relational database managementsystem. Scripts periodically maintain a second copy in a specializedtext database. The Web server program per<strong>for</strong>ms queries, inserts,and updates to the RDBMS. When a user requests a full-text search,however, the query is sent to the text database.One argument against the split-system approach is that two copies ofthe document collection are being kept. In an age of $200 disk drivesof absurdly high capacity, this isn't a powerful argument. It is nearlyimpossible to fill a modern disk drive with words typed by humans.One can fill up a disk drive with video or audio streams, but not text.And in any case some full-text search systems can build an index toa document collection without themselves keeping the originaldocument around, i.e., you would in fact have only one copy of thedocument in the RDBMS.A second argument against using RDBMS and full-text searchsystems simultaneously is that the collections will get out of sync. Ifthe Web server crashes in the middle of an RDBMS transaction, allwork is rolled back. If the Web server was simultaneously inserting adocument into a full-text search system, it is possible that the full-textdatabase will contain a document that is not in fact available on themain pages of site--the site being generated from the RDBMS.Alternatively the RDBMS insert might succeed while the full-textinsert fails, leading to a document that is available on the site but notsearchable. This argument, too, ultimately lacks power. It is true thatthe RDBMS is a convenient and nearly foolproof means of managingtransactions and concurrency. However, it is not the only way. If onewere to hire sufficiently careful programmers and sufficientlydedicated system and database administrators it would be possibleto keep two databases in sync.233

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!