10.07.2015 Views

Critical Incidents in Performance Testing - The Workshop On ...

Critical Incidents in Performance Testing - The Workshop On ...

Critical Incidents in Performance Testing - The Workshop On ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

WOPR8 Experience Report<strong>Critical</strong> <strong>Incidents</strong> <strong>in</strong> <strong>Performance</strong> Test<strong>in</strong>gDan Down<strong>in</strong>g – April 7, 2007MENTORA GROUPAbstractHav<strong>in</strong>g participated <strong>in</strong> over 100 performance tests for more than 75 companies over the last 8 years,spann<strong>in</strong>g <strong>in</strong> complexity from small web retail sites to enterprise ERP systems, I have experienced critical<strong>in</strong>cidents <strong>in</strong> all four category – process, tools, teams, and stakeholder engagement. Each critical <strong>in</strong>cidenttaught an important lesson that has raised the quality of our service and the performance IQ of our test<strong>in</strong>gteam.In this experience report, I will present two salient critical <strong>in</strong>cidents that should provide other WOPRattendees with specific techniques <strong>in</strong> model<strong>in</strong>g with end-user participation, tool vett<strong>in</strong>g, understand<strong>in</strong>gunderly<strong>in</strong>g technology, and results report<strong>in</strong>g.Lawsuit Anyone?In October of 2000 I spent two weeks onsite at Kmart’s Blueghligth.com space across from Fisherman’sWharf <strong>in</strong> San Francisco, with one of my senior performance eng<strong>in</strong>eers, stress test<strong>in</strong>g their Martha Stewartbranded web store <strong>in</strong> preparation for the holiday season. Keith and I were embedded with theirdevelopment and QA teams <strong>in</strong> a fishbowl conference room, work<strong>in</strong>g 12+ hours a day, model<strong>in</strong>g andtest<strong>in</strong>g the typical mix of ecommerce transactions (browse, search, add-to-cart, checkout, register, etc.),with LoadRunner.It was a dynamically evolv<strong>in</strong>g site, as the bus<strong>in</strong>ess owners kept tun<strong>in</strong>g their store model, the “creatives”kept chang<strong>in</strong>g section designs, developers kept modify<strong>in</strong>g and fix<strong>in</strong>g code, and the <strong>in</strong>frastructure teamscrambled to build the production <strong>in</strong>frastructure hosted at a Bay Area host<strong>in</strong>g facility about an hour away.Despite the long hours and the constant re-record<strong>in</strong>g and tweak<strong>in</strong>g of our LoadRunner scripts, it was afrenetic but <strong>in</strong>fectious “open bullpen” environment, with 24/7 free M&M’s, soft dr<strong>in</strong>ks, coffee, and food,and where Martha herself could often be seen on the floor chatt<strong>in</strong>g it up with the bus<strong>in</strong>ess team.We ran well over a dozen tests up to 4000 concurrent users, driv<strong>in</strong>g load from Mercury’s hosted load<strong>in</strong>jectors <strong>in</strong> Santa Clara, and from our own <strong>in</strong>jectors <strong>in</strong> Memphis. When the Web Director declared thatsite performance was “good enough” <strong>in</strong> early October as they went <strong>in</strong>to the f<strong>in</strong>al critical activities to golive, I asked him details about the software version and the environment we tested on for our f<strong>in</strong>al report.He brushed us off, say<strong>in</strong>g that the graphs and <strong>in</strong>terim results we’d already generated were good enough.With that we packed our bags, high-fived the development team, and left.Fast-forward to the day after Thanksgiv<strong>in</strong>g: Several large sites, <strong>in</strong>clud<strong>in</strong>g Amazon, BestBuy, PayPal, and,yes, BlueLight had experienced performance hell dur<strong>in</strong>g the start of the shopp<strong>in</strong>g onslaught. Our CEOwas gett<strong>in</strong>g calls from BlueLight, and from the media, and we were suddenly on the defensive.Our CEO called a big meet<strong>in</strong>g that afternoon: “What had happened? Hadn’t we validated their site for4000 concurrent users (no, we never got that high)? <strong>The</strong> Web Director was press<strong>in</strong>g us. What facts didwe have to respond with?”I knew that we’d left on October 2 nd , when the Director had said we’d done enough. I knew theirapplication was still undergo<strong>in</strong>g changes, and would be right up to go-live <strong>in</strong> late October. I hadspreadsheets with logs of the 18 test sessions we’d conducted. I had tons of graphs, status reportsidentify<strong>in</strong>g issues we’d found (both with our own test environment as well as with their site). I knew thatthe ATG Dynamo performance expert was still on-site after we left try<strong>in</strong>g to fix the performance-halt<strong>in</strong>ggarbage collection ‘heartbeat’ problem we’d identified.Atlanta Boston Wash<strong>in</strong>gton DC www.mentora.com


MENTORA GROUPBut I could not provide: the version of the release we’d last tested (if there even was one); theconfiguration of the environment we had last tested on; the extent to which the f<strong>in</strong>al product catalog hadbeen loaded <strong>in</strong>to the database at the time of test<strong>in</strong>g, or even the size of the database.Net-net: I could not produce the report we’d presented to the customer—because we hadn’t. I’d allowedthe brash, impatient director to talk me out of produc<strong>in</strong>g it—which would have forced me to drag theaforementioned specifics out of the systems guy, the DBA, the Sr. Developer.<strong>The</strong> follow<strong>in</strong>g Monday, December 4, InformationWeek came out with a front-page article on e-retail<strong>in</strong>g,featur<strong>in</strong>g none other than the Web Director at BlueLight, a quote about hav<strong>in</strong>g used AppGenesys (thecompany I worked for at the time) to do their stress test<strong>in</strong>g, an admission that page response times hadgone up to 30 seconds dur<strong>in</strong>g the peak, and a defensive quote from our CEO back-peddl<strong>in</strong>g about the“unpredictability of the web, and all the factors you can’t control”.Thanks to the frenetic pace of the time, some fast talk<strong>in</strong>g by our CEO, and providence, we escaped alawsuit, any really bad press, and no one got fired. But for me it was then – and is still now – my very first(certa<strong>in</strong>ly not my last) critical <strong>in</strong>cident <strong>in</strong> performance test<strong>in</strong>g.S<strong>in</strong>ce then: I always generate a f<strong>in</strong>al report – whether the customer values this report or not. And I f<strong>in</strong>d awayto present it, even if only via email (which I carefully file). I document exactly what we tested: the software release, the bus<strong>in</strong>ess processes, theconfiguration of the system-under-test I note the date and times of our “best and f<strong>in</strong>al” test I describe how and what we tested, <strong>in</strong>clud<strong>in</strong>g any limitations or simplify<strong>in</strong>g assumptions we wereforced to make Key f<strong>in</strong>d<strong>in</strong>gs, conclusions and recommendations are made with carefully chosen language, be<strong>in</strong>gcareful to not overstate, be overly-optimistic, use overly-positive language, or assert 100%confidence <strong>in</strong> our results.(Click here for sample report and InformatioTwo salient critical nWeek article).Know Thy Test ToolsEver been caught <strong>in</strong> embarrass<strong>in</strong>g situations with your test tools about your knees? Well I have. Twice.10,000 User Test with OpenSTA.A company that develops and markets and an on-l<strong>in</strong>e student test<strong>in</strong>g software and <strong>in</strong>frastructure packagewas be<strong>in</strong>g evaluated for scalability by the State of Oregon—which if accepted would result <strong>in</strong> a majorpurchase. We were contracted through one of our partners to do the test<strong>in</strong>g. I’d never run this size test,not even with LoadRunner, and OpenSTA was the only tool we could use (based on licens<strong>in</strong>g costs andour experience). I said “sure, we can do that!” Oh by the way, up to this po<strong>in</strong>t to biggest test we’d everrun with OpenSTA was 1500 users…<strong>The</strong> test was to simulate 10,000 students logg<strong>in</strong>g <strong>in</strong> under unique IDs, answer<strong>in</strong>g 35 questions, andlogg<strong>in</strong>g out, with an end-to-end duration of 50 to 60 m<strong>in</strong>utes, <strong>in</strong>clud<strong>in</strong>g th<strong>in</strong>k-time. We would launch usersat a rate of 5 per second, ramp<strong>in</strong>g up to 10K over a 35 m<strong>in</strong>ute period. <strong>The</strong> test<strong>in</strong>g would occur on thecompany’s production environment, which required schedul<strong>in</strong>g of each test at night, and substantialenvironment preparation by the customer for each test.<strong>The</strong> system-under-test consisted of 28 web servers runn<strong>in</strong>g L<strong>in</strong>ux and Res<strong>in</strong>, and a database serverrunn<strong>in</strong>g Postgres SQL. We would run two prelim<strong>in</strong>ary tests at 2500 and 5000 users before the f<strong>in</strong>al testat 10,000—all this <strong>in</strong> a 2-3 week period start<strong>in</strong>g the week of Thanksgiv<strong>in</strong>g (2004).We kicked off the project two weeks before Thanksgiv<strong>in</strong>g. Script development went smoothly, as did a30-user shakedown test conducted the Tuesday before Thanksgiv<strong>in</strong>g. To drive this load, we pressed <strong>in</strong>toAtlanta Boston Wash<strong>in</strong>gton DC www.mentora.com


MENTORA GROUPservice a big honk<strong>in</strong>g server: quad 1.5 GHz Xeon Hyper-threaded Dell 6650 with 4 GB memory, 160GBATA disk and dual GIG-E network cards. We were ready –and confident--for the 2,500 user test onWednesday afternoon.<strong>The</strong> first test started, users started ramp<strong>in</strong>g (at 3 users/second for this test) and everyth<strong>in</strong>g seemed to bework<strong>in</strong>g well, both at our end, and the customer was see<strong>in</strong>g users logg<strong>in</strong>g <strong>in</strong>. But the good news wasshort lived—at about 6 m<strong>in</strong>utes <strong>in</strong>to the test, OpenSTA was show<strong>in</strong>g about 900 users runn<strong>in</strong>g, butnetwork throughput effectively stopped. We and the customer tried to diagnose on the fly—networkfailure at either end? System event on the load <strong>in</strong>jector? <strong>The</strong> customer reported “runaway httpconnections”. We stopped the test and agreed to diagnose fully and get back the day after Thanksgiv<strong>in</strong>g.Was it the new load <strong>in</strong>jector? Was the problem with hyper-thread<strong>in</strong>g? Was our firewall overloaded?Some issue with OpenSTA? <strong>The</strong> load <strong>in</strong>jector event log showed noth<strong>in</strong>g. Our perfmon logs revealed nolimit<strong>in</strong>g resource. Our network<strong>in</strong>g guys exam<strong>in</strong>ed our firewall stats and concluded it had plenty ofcapacity. What to do next? Isolate the variables, research each one, etc. This took a few days…andcost us precious time and credibility.Our performance eng<strong>in</strong>eer L<strong>in</strong>da dug <strong>in</strong>to the OpenSTA FAQs and forum and came up with a hypothesis:given the high ramp rate, and the script’s large number of steps open<strong>in</strong>g a large number of sockets forpage resources that were be<strong>in</strong>g haphazardly closed by the default DISCONNECT nn statementsOpenSTA <strong>in</strong>serts dur<strong>in</strong>g record<strong>in</strong>g – we were runn<strong>in</strong>g out of sockets! That’s when we learned aboutexplicitly <strong>in</strong>sert<strong>in</strong>g DISCONNECT ALLs after each page! We conducted an <strong>in</strong>ternal test with and withoutSYNCHRONIZE REQUESTS, runn<strong>in</strong>g on our hyper-threaded <strong>in</strong>jector, and proved conclusively that thiswas an issue.We f<strong>in</strong>ally conduct a successful 2500-user test on December 6—ga<strong>in</strong><strong>in</strong>g back a little customerconfidence, and mov<strong>in</strong>g us forward to schedule the 5K test—though with <strong>in</strong>ternal confidence greatlyweakened, and imag<strong>in</strong><strong>in</strong>g more disasters lurk<strong>in</strong>g.Well guess what limit we ran <strong>in</strong>to next? Yes, the <strong>in</strong>famous 1664-virtual-user-per-Commander OpenSTAlimitation, related to the way it uses file handles and the file handle limit on W<strong>in</strong>dows Server. More eggon face, more time and credibility lost, while we scrambled to cobble together more load <strong>in</strong>jectors, thenpa<strong>in</strong>stak<strong>in</strong>gly merge Timer results to graph results.Our customer contact escalated his frustrations to his boss, who then called our partner’s CEO, who thencalled our CEO—it was not pretty. But somehow we managed to keep them from throw<strong>in</strong>g us out (theywere as committed as we were by now) – and by January 4 we were able to press seven <strong>in</strong>jectors <strong>in</strong>toservice, fielded a team of 5 people to coord<strong>in</strong>ate launch<strong>in</strong>g of separate Commanders on each box, withour network<strong>in</strong>g guys monitor<strong>in</strong>g our entire <strong>in</strong>frastructure, and managed to conduct a successful10K test.After the test we had do some creative merg<strong>in</strong>g and massag<strong>in</strong>g of the huge Timer and HTTP logs, andf<strong>in</strong>ally were able to publish a report that showed good scalability at the target load that the customer couldpresent to the State of Oregon. That deal – and our butts—were saved.<strong>Critical</strong> Incident # 2: Know your test<strong>in</strong>g tools – their capabilities and limitations – and TEST TEST TESTthe capacity and limitations of your load <strong>in</strong>jectors and test environment.Hav<strong>in</strong>g learned this hard lesson, now ask me about the limitations of us<strong>in</strong>g LoadRunner to test a Citrixapplication. ARGH!Atlanta Boston Wash<strong>in</strong>gton DC www.mentora.com

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!