13.07.2015 Views

Software Engineering for Internet Applications - Student Community

Software Engineering for Internet Applications - Student Community

Software Engineering for Internet Applications - Student Community

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

code files to begin with, what would have been an insignificant bugbecomes a half-day project.Let's walk through an example of how the software is arranged on thephoto.net service. The server is configured to operate multiple<strong>Internet</strong> services. Each one is located at /web/service-name/ whichmeans that all the directories associated with photo.net areunderneath /web/photonet/. The page root <strong>for</strong> the site is/web/photonet/www/. The Web server is configured to look <strong>for</strong>"library" procedures (shared by multiple pages) in /web/photonet/tcl/,a name derived from the fact that photo.net is run on AOLserver,whose default extension language is Tcl.RDBMS table, index, and stored procedure definitions <strong>for</strong> a moduleare stored in a single file in the /doc/sql/ directory (directory names inthis chapter are relative to the Web server page root unless specifiedas absolute). The name <strong>for</strong> this file is the module name followed by a.sql extension, e.g., chat.sql <strong>for</strong> the chat module. Shared procedures<strong>for</strong> all modules are stored in the single library directory/web/photonet/tcl/, with each file named "modulename-defs.tcl", e.g.,chat-defs.tcl.Scripts that generate individual pages are parked at the followinglocations: /module-name/ <strong>for</strong> the user pages; /module-name/admin/<strong>for</strong> the moderator pages, e.g., where a user with moderator privilegeswould go to delete a posting; /admin/module-name/ <strong>for</strong> the siteadministrator pages, e.g., where the service operator would go toenable or disable a service, delegate moderation authority to anotheruser, etc.A high-level document explaining each module is stored in/doc/module-name.html and linked from the index page in /doc/. Thisdocument is intended as a starting point <strong>for</strong> programmers who areconsidering using the module or extending a feature of the module.The document has the following structure:1361. Where to find all the software associated with this module(site-wide conventions are nice but it doesn't hurt to beexplicit).2. Big picture in<strong>for</strong>mation: Why was this module built? Whyaren't/weren't existing alternatives adequate <strong>for</strong> solving theproblem? What are the high-level good and bad features ofthis module? What choices were considered in developingthe data model?which is the IP address of photo.net's load balancer. The loadbalancer accepts the TCP connection on port 80 and waits <strong>for</strong> theWeb client to provide a request line, e.g., "GET / HTTP/1.0". Onlyafter that request has been received does the load balancer attemptto contact a Web server on the private network behind it.Notice first that this sort of router provides some inherent security.The Web servers and RDBMS server cannot be directly contacted bycrackers on the public <strong>Internet</strong>. The only ways in are via a successfulattack on the load balancer, an attack on the Web server program(Microsoft <strong>Internet</strong> In<strong>for</strong>mation Server suffered from many bufferoverrun vulnerabilities), or an attack on publisher-authored pagescripts. The router also provides some protection against denial ofservice attacks. If a Web server is configured to spawn a maximum of100 simultaneous threads, a malicious user can effectively shut downthe site simply by opening 100 TCP connections to the server andthen never sending a request line. The load balancers are smartabout reaping such idle connections and in any case have very longqueues.The load balancer can execute arbitrarily complex algorithms indeciding how to route a user request. It can <strong>for</strong>ward the request to aset of front-end servers in a round-robin fashion, taking a server outof the rotation if it fails to respond. The load balancer can periodicallypull load and health in<strong>for</strong>mation from the front-end servers and sendeach incoming request to the least busy server. The load balancercan inspect the URI requested and route to a particular server, <strong>for</strong>example, sending any request that starts with "/discuss/" to theWindows machine that is running the discussion <strong>for</strong>um software. Theload balancer can keep a table of where previous requests wererouted and try to route successive requests from a particular user tothe same front-end machine (useful in cases where state is built up ina layer other than the RDBMS).Whatever algorithm the load balancer is using, a hardware failure inone of the front-end machines will generally result in the failure ofonly a handful of user requests, i.e., those in-process on the machinethat actually fails.How are load balancers actually built? It seems that we need acomputer program that waits <strong>for</strong> a Web request, takes some action,then returns a result to the user. Isn't this what Web server programsdo? So why not add some code to a standard Web server program,run the combination on its own computer and call that our loadbalancer? That's precisely the approach taken by the Zeus Load213

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!