DATABASE TECHNOLOGYScaling Out SQL Serverwith Data-Dependent RoutingTo scale out database applications efficiently and cost-effectively, enterprises canuse a data-dependent routing (DDR) method to partition and access data across anarray of industry-standard, symmetric multiprocessing nodes. This article explainshow administrators can implement a DDR approach to scale out highly availableMicrosoft ® SQL Server database–based enterprise applications, and then presentsbenchmark results from a pilot study in which Microsoft engineers simulated a realworldenterprise scenario—part of the Communication Services Platform for MSN(The Microsoft Network). The pilot configuration ran Microsoft SQL Server 2005 Beta 2on Dell PowerEdge servers, Dell/EMC storage, and Emulex network adapters.BY MAN XIONG, BRIAN GOLDSTEIN, AND CHRIS AUGERRelated Categories:Pilot studyDatabaseDell/EMC storageEmulexMicrosoft SQL ServerMicrosoft WindowsPerformanceVisit www.dell.com/powersolutionsfor the complete category index.The past decade has witnessed tremendous growth inenterprise applications on the Internet, and along withthat, an explosion of data storage requirements. Today,many organizations must contend with the possibilitythat millions of online users will be using the Internet forcommon transactions such as shopping, storing e-mailmessages, and viewing financial information. Databasesystems are at the heart of most enterprise applications,and Microsoft SQL Server is a leading database platformin large data centers—particularly those supporting onlinetransactions.A scalable database platform allows application designersto start small and grow a system as large as necessary.Traditionally, administrators have achieved scalability withsymmetric multiprocessing (SMP) servers—using a scaleupapproach by adding processors, memory, disks, andnetwork adapters to a single server when required. As theworkload increases, however, a single database server—also referred to as a node in this article—will eventually hita bottleneck and be unable to support further capacity. Thissituation typically becomes evident in terms of diminishingperformance returns when additional system componentsare added to the chassis—or prohibitively expensive hardwareupgrades to achieve the requisite performance.To enable growth much beyond a factor of 10—whichis generally considered to be the practical limit for scalingup a monolithic SMP server—application designers mayadopt a scale-out architecture in which the workload anddatabase can be partitioned among an array of industrystandardSMP nodes. A scale-out architecture enablesadministrators to grow systems incrementally by addingmore nodes to the array. Such an array is often referred toas a federation of servers. Ideally, the partitioning is easyto manage and transparent to the application.98DELL <strong>POWER</strong> <strong>SOLUTIONS</strong> Reprinted from Dell Power Solutions, August 2005. Copyright © 2005 Dell Inc. All rights reserved. August 2005
DATABASE TECHNOLOGYThe scale-out approach based on a federation of servers is designedto provide several important advantages, including the following:• Cost-effectiveness: SMP node arrays can be expanded in smallincrements of standards-based commodity components quicklyand flexibly, in response to immediate business needs.• Fault tolerance: Redundant system components can help preventdowntime so that a failure in one node does not necessarilycause the entire SMP array to become unavailable.• High availability: The relative independence of nodes in theSMP array helps facilitate server failover, enabling high availabilityof enterprise applications.Despite such attractive benefits, the scale-out approach can posecomplex management challenges because a federation of serverstypically involves many more components than a monolithic SMPserver. Also, every application cannot be conveniently partitionedacross nodes, so the scale-out approach may work well for someapplications but be unsuitable for others.Data-dependent routingA key design decision for any scaled-out database platform is todetermine the best way to partition data among the different nodesin an SMP array. Many applications can be partitioned by a particularvalue such as customer name, store location, time/date, registername, and so forth. The important consideration is how the datawill be accessed. For example, the retail application for a large insurancecompany might be partitioned by branch office, allowing eachbranch office to maintain client records locally. In such a scenario,all data access would be local and a batch job could replicate newor modified records to the central SQL Server array at headquartersevery night. In most cases, agents working at the branch officeswould not access the central SQL Server array, and corporate analystscould run their reports against the central database withoutaccessing every branch-office SQL Server node.Data-dependent routing (DDR) requires intelligence in the clientapplication, typically in the middle-tier layer, to route databaserequests to the appropriate node. However, the DDR approach doesnot provide views across nodes. With the exception of shared databaseschema, each node in a federation of servers is designed tooperate independently. The middle tier contains the mappings tohow data is partitioned and which node contains specific data.In the preceding retail insurance scenario with partitioning, theapplication must be designed to track where records are located.This example assumes that a SQL Server database has been createdon the middle-tier Web server. The database has been partitionedby customer ID across a number of federated servers, and a lookuptable on the middle tier maps each customer ID to the node wherethe corresponding data partition resides.Figure 1 shows a sample lookup table. When the customerservice representative needs to display all transaction records forcustomer 10015, the application sends the request only to node 1.Because access is localizedto a single node, no requestsregarding customer 10015 aresent to nodes 2 and 3.Since the application waspartitioned by customer ID,records for each product IDcould be stored on every databasenode. That would requirethe application to query everynode, bring back all records that match each product ID, and thencombine and sort the results. When product ID access is not localized,this could be a time-consuming operation. However, productID updates could be run as a background batch job to help avoidslowing the response time for user transactions.Challenges to the scale-out approachWhen scaling out an application, administrators must address severalcomplexities involving management, data partitioning, applicationdevelopment and revision, and high-availability practices. Suchchallenges include the following considerations:• Management: The larger number of nodes in a federation ofservers compared to a monolithic SMP server can increaseoverhead for operations management. For example, administratorsmust plan maintenance across an array of SMP nodesrather than a single node, and find ways to add and removenodes without affecting application availability.• Data partitioning: As enterprise applications grow, businessneeds may change—which may in turn require administratorsto alter the way data is partitioned across the federation of servers.In addition, load balancing across an array of SMP nodescan be difficult to achieve because “hot spots” can develop onsome nodes while other nodes remain relatively idle.• Application development and revision: As business requirementsand data access needs change, administrators mustmodify database schema—and find ways to implement thesemodifications without affecting application availability.• High-availability practices: Administrators must determinehow to handle single-node failures without disrupting applicationavailability, and find ways to minimize the time ittakes to restore a database to a single node.Such challenges will be addressed in the following sections,which describe a successful scale-out scenario for a large-scaleenterprise application.Customer ID Node ID10015 110016 210017 110018 3Figure 1. Lookup table mapping customerIDs to nodes where the correspondingdata partitions residewww.dell.com/powersolutions Reprinted from Dell Power Solutions, August 2005. Copyright © 2005 Dell Inc. All rights reserved. DELL <strong>POWER</strong> <strong>SOLUTIONS</strong> 99