code files to begin with, what would have been an insignificant bugbecomes a half-day project.Let's walk through an example of how the software is arranged on thephoto.net service. The server is configured to operate multiple<strong>Internet</strong> services. Each one is located at /web/service-name/ whichmeans that all the directories associated with photo.net areunderneath /web/photonet/. The page root <strong>for</strong> the site is/web/photonet/www/. The Web server is configured to look <strong>for</strong>"library" procedures (shared by multiple pages) in /web/photonet/tcl/,a name derived from the fact that photo.net is run on AOLserver,whose default extension language is Tcl.RDBMS table, index, and stored procedure definitions <strong>for</strong> a moduleare stored in a single file in the /doc/sql/ directory (directory names inthis chapter are relative to the Web server page root unless specifiedas absolute). The name <strong>for</strong> this file is the module name followed by a.sql extension, e.g., chat.sql <strong>for</strong> the chat module. Shared procedures<strong>for</strong> all modules are stored in the single library directory/web/photonet/tcl/, with each file named "modulename-defs.tcl", e.g.,chat-defs.tcl.Scripts that generate individual pages are parked at the followinglocations: /module-name/ <strong>for</strong> the user pages; /module-name/admin/<strong>for</strong> the moderator pages, e.g., where a user with moderator privilegeswould go to delete a posting; /admin/module-name/ <strong>for</strong> the siteadministrator pages, e.g., where the service operator would go toenable or disable a service, delegate moderation authority to anotheruser, etc.A high-level document explaining each module is stored in/doc/module-name.html and linked from the index page in /doc/. Thisdocument is intended as a starting point <strong>for</strong> programmers who areconsidering using the module or extending a feature of the module.The document has the following structure:1361. Where to find all the software associated with this module(site-wide conventions are nice but it doesn't hurt to beexplicit).2. Big picture in<strong>for</strong>mation: Why was this module built? Whyaren't/weren't existing alternatives adequate <strong>for</strong> solving theproblem? What are the high-level good and bad features ofthis module? What choices were considered in developingthe data model?which is the IP address of photo.net's load balancer. The loadbalancer accepts the TCP connection on port 80 and waits <strong>for</strong> theWeb client to provide a request line, e.g., "GET / HTTP/1.0". Onlyafter that request has been received does the load balancer attemptto contact a Web server on the private network behind it.Notice first that this sort of router provides some inherent security.The Web servers and RDBMS server cannot be directly contacted bycrackers on the public <strong>Internet</strong>. The only ways in are via a successfulattack on the load balancer, an attack on the Web server program(Microsoft <strong>Internet</strong> In<strong>for</strong>mation Server suffered from many bufferoverrun vulnerabilities), or an attack on publisher-authored pagescripts. The router also provides some protection against denial ofservice attacks. If a Web server is configured to spawn a maximum of100 simultaneous threads, a malicious user can effectively shut downthe site simply by opening 100 TCP connections to the server andthen never sending a request line. The load balancers are smartabout reaping such idle connections and in any case have very longqueues.The load balancer can execute arbitrarily complex algorithms indeciding how to route a user request. It can <strong>for</strong>ward the request to aset of front-end servers in a round-robin fashion, taking a server outof the rotation if it fails to respond. The load balancer can periodicallypull load and health in<strong>for</strong>mation from the front-end servers and sendeach incoming request to the least busy server. The load balancercan inspect the URI requested and route to a particular server, <strong>for</strong>example, sending any request that starts with "/discuss/" to theWindows machine that is running the discussion <strong>for</strong>um software. Theload balancer can keep a table of where previous requests wererouted and try to route successive requests from a particular user tothe same front-end machine (useful in cases where state is built up ina layer other than the RDBMS).Whatever algorithm the load balancer is using, a hardware failure inone of the front-end machines will generally result in the failure ofonly a handful of user requests, i.e., those in-process on the machinethat actually fails.How are load balancers actually built? It seems that we need acomputer program that waits <strong>for</strong> a Web request, takes some action,then returns a result to the user. Isn't this what Web server programsdo? So why not add some code to a standard Web server program,run the combination on its own computer and call that our loadbalancer? That's precisely the approach taken by the Zeus Load213
translation had elapsed--the site would be up and running andproviding pages to hundreds of thousands of users worldwide, butnot to those users who'd received an unlucky DNS translation to thedead machine. For a typical domain this period of time might beanywhere from 6 hours to 1 week. CNN, aware of this problem, couldshorten the expiration and "minimum time-to-live" on cnn.com but ifthese were cut down to, say, 30 seconds, the load on CNN's nameservers might start approaching the intensity of the load on its Webservers. Nearly every user page request would be preceded by arequest <strong>for</strong> a DNS translation. (In fact CNN set their minimum timeto-liveto 15 minutes.)A final problem with round-robin DNS is that it does not provideabstraction. Suppose that CNN, whose primary servers were all Unixmachines, wished to run some discussion <strong>for</strong>um software that wasonly available <strong>for</strong> Windows. The IP addresses of all of its servers arepublicly exposed. The only way to direct users to a different machine<strong>for</strong> a particular part of the service would be to link them to a differenthostname, which could there<strong>for</strong>e be translated into a distinct IPaddress. For example, CNN would link users to"http://<strong>for</strong>ums.cnn.com". Users who enjoyed these <strong>for</strong>ums wouldbookmark the URL and other sites on the <strong>Internet</strong> would inserthyperlinks to this URL. After a year, suppose that the Windowsservers were dying and the people who knew how to maintain themhad moved on to other jobs. Meanwhile, the discussion <strong>for</strong>umsoftware has become available <strong>for</strong> Unix as well. CNN would like topull the discussion service back onto its main server farm, at a URLof http://www.cnn.com/discuss/. Why should users be aware of thisreshuffling of hardware?**** insert drawing of server farm (cloud), load balancer, public<strong>Internet</strong> (cloud) ****Figure 2: To preserve the freedom of rearranging components withinthe server farm, typically users on the public <strong>Internet</strong> only talk to aload balancing router, which is the "public face" of the service andwhose IP address is what www.popularservice.com translates to.The modern approach to load balancing is the load balancing router.This machine, typically built out of standard PC hardware running afree Unix operating system and a thin layer of custom software, is theonly machine that is visible from the public <strong>Internet</strong>. All of the serverhardware is behind the load balancer and has IP addresses thataren't routable from the rest of the <strong>Internet</strong>. If a user requestswww.photo.net, <strong>for</strong> example, this is translated to 216.127.244.133,2123. Configuration in<strong>for</strong>mation: What can be changed easily byediting parameters?4. Use and maintenance in<strong>for</strong>mation.For an example of such a document, seehttp://philip.greenspun.com/seia/examples-software-modularity/chat.7.2 Shared Procedures versus Stored ProceduresEven in the simplest Web development environments there aregenerally at least two places where procedural abstractions, i.e.,fragments of programs that are shared by multiple pages, can bedeveloped. Modern relational database management systems caninterpret Turing-complete imperative programming languages suchas C#, Java, and PL/SQL. Thus any computation that could beper<strong>for</strong>med by any computer could, in principle, be per<strong>for</strong>med by aprogram running inside an RDBMS such as Microsoft SQL Server,Oracle, or PostgreSQL. I.e., you don't need a Web server or anyother tools but could implement page scripting and an HTTP serverwithin the database management system, in the <strong>for</strong>m of storedprocedures.As we'll see in the "Scaling Gracefully" chapter there are someper<strong>for</strong>mance advantages to be had in splitting off the presentationlayer of an application into a set of separate physical computers.Thus our page scripts will most definitely reside outside of theRDBMS. This gives us the opportunity to write additional softwarethat will run within or close to the Web server program, typically in thesame computer language that is used <strong>for</strong> page scripting, in the <strong>for</strong>mof shared procedures. In the case of a PHP script, <strong>for</strong> example, ashared procedure could be an include file. In the case of a site whereindividual pages are scripted in Java or C#, a shared proceduremight be some classes and methods used by multiple pages.How do you choose between using shared procedures and storedprocedures? Start by thinking about the multiple applications thatmay connect to the same database. For example, there could be apublic Web server, a nightly program that pulls out all newin<strong>for</strong>mation <strong>for</strong> analysis, a maintenance tool <strong>for</strong> administrators builton top of Microsoft Excel or Access, etc.If you think that a piece of code might be useful to those othersystems that connect to the same data model, put it in the databaseas a stored procedure. If you are sure that a piece of code is only137
- Page 1 and 2:
SoftwareEngineering forInternetAppl
- Page 3 and 4:
Signature: ________________________
- Page 5 and 6:
end-users. We use every opportunity
- Page 7 and 8:
• availability of magnet content
- Page 9 and 10:
• we want to see if a student is
- Page 11 and 12:
you supply English-language queries
- Page 13 and 14:
What to do during lecturesWe try to
- Page 15 and 16:
The one-term cram courseWhen teachi
- Page 17 and 18:
332• spend a term learning how to
- Page 19 and 20:
Once we've taught students how to b
- Page 21 and 22:
has permission to perform each task
- Page 23 and 24:
UDDIUnixcustomer's credit card. If
- Page 25 and 26:
thousands of concurrent users. This
- Page 27 and 28:
OraclePerlnamed XYZ" without the pr
- Page 29 and 30:
LDAPLinuxbits per color, a vastly s
- Page 31 and 32:
FilterFirewallFlat-fileGIF318functi
- Page 33 and 34:
when there is an educational dimens
- Page 35 and 36:
system. The authors of the core pro
- Page 37 and 38:
Sign-OffsTry to schedule comprehens
- Page 39 and 40:
scheduling goals that both you and
- Page 41 and 42:
Client Tenure In Job (new, mid-term
- Page 43 and 44:
ReferencesEngagement ManagementSQL*
- Page 45 and 46:
Decision-makers often bring senior
- Page 47 and 48:
presentation to a panel of outsider
- Page 49 and 50:
300always been written by programme
- Page 51 and 52:
17.3 Professionalism in the Softwar
- Page 53 and 54:
Try to make sure that your audience
- Page 55 and 56:
Chapter 17WriteupIf I am not for my
- Page 57 and 58:
Suppose that an RDBMS failure were
- Page 59 and 60:
analysis programs analyzing standar
- Page 61 and 62:
at 9 hours 11 minutes 59 seconds pa
- Page 63 and 64:
found" will result in an access log
- Page 65 and 66:
15.18 Time and MotionThe team shoul
- Page 67 and 68:
select 227, 891, 'algorithm', curre
- Page 69 and 70:
create table km_object_views (objec
- Page 71 and 72:
• object-create• object-display
- Page 73 and 74:
The trees chapter of SQL for Web Ne
- Page 75 and 76:
);274-- ordering within a form, low
- Page 77 and 78:
and start the high-level document f
- Page 79 and 80:
Example Ontology 2: FlyingWe want a
- Page 81 and 82:
systems. What would a knowledge man
- Page 83 and 84:
spreadsheet". Other users can comme
- Page 85 and 86: Chapter 15Metadata (and Automatic C
- Page 87 and 88: {site url}{site description}en-usCo
- Page 89 and 90: drawing on the intermodule API that
- Page 91 and 92: At this point you have something of
- Page 93 and 94: • description• URL for a photo
- Page 95 and 96: Here's a raw SOAP request/response
- Page 97 and 98: Chapter 14Distributed Computing wit
- Page 99 and 100: conduct programmer job interviews h
- Page 101 and 102: Most admin pages can be excluded fr
- Page 103 and 104: content that should distinguish one
- Page 105 and 106: Chapter 13Planning ReduxA lot has c
- Page 107 and 108: the Internet-specific problem of no
- Page 109 and 110: wouldn't see these dirty tricks unl
- Page 111 and 112: 12.8 Exercise 4: Big BrotherGeneral
- Page 113 and 114: than one call to contains in the sa
- Page 115 and 116: A third argument against the split
- Page 117 and 118: way 1 1/16One might argue that this
- Page 119 and 120: absquatulate 612bedizen 36, 9211cry
- Page 121 and 122: What if the user typed multiple wor
- Page 123 and 124: Chapter 12S E A R C HRecall from th
- Page 125 and 126: long as it is much easier to remove
- Page 127 and 128: features that are helpful? What fea
- Page 129 and 130: made it in 1938)? Upon reflection,
- Page 131 and 132: environment, we identify users by t
- Page 133 and 134: those updates by no more than 1 min
- Page 135: Balancer and mod_backhand, a load b
- Page 139 and 140: It seems reasonable to expect that
- Page 141 and 142: 11.1.5 Transport-Layer EncryptionWh
- Page 143 and 144: such as ticket bookings would colla
- Page 145 and 146: give their site a unique look and f
- Page 147 and 148: It isn't challenging to throw hardw
- Page 149 and 150: Chapter 11Scaling GracefullyLet's l
- Page 151 and 152: 10.15 Beyond VoiceXML: Conversation
- Page 153 and 154: Consider that if you're authenticat
- Page 155 and 156: In this example, we:194• ask the
- Page 157 and 158: As in any XML document, every openi
- Page 159 and 160: (http://www.voicegenie.com). These
- Page 161 and 162: Chapter 10Voice (VoiceXML)questions
- Page 163 and 164: 9.15 MoreStandards information:•
- Page 165 and 166: 9.14 The FutureIn most countries th
- Page 167 and 168: 9.10 Exercise 7: Build a Pulse Page
- Page 169 and 170: 9.6 Keypad HyperlinksLet's look at
- Page 171 and 172: text/xml,application/xml,applicatio
- Page 173 and 174: Protocol (IP) routing, a standard H