You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
DATABASE CACHE SIZE IN GBFigure 2Uatahc~sc Cnclie Sizc \Jessus T~I-ougliputprocesses. For examplc, an 8-GI? system allows 6.6 GRto be used for tlie databasc c~che.Performance AnalysisWhy docs thc use of VLLM impro\~c pcrformancc by afactor of nearly 21 Using statistics within the database,we measured the database-cache hit ratio as memorywas added. Figure 3 sl~ows the direct correlationbcnvccn lnorc mclnory and dccrcascd database-cachemisses: as Inemor!! is added, the database-cache missratc dcclincs from 12 pcrccnt to 5 pcrccnt. This raiseshvo more c1i1cstions: (1) Why docs the database-cachemiss rate rem~in at 5 percent? and (2) Why does asmall chdnge in database-cache miss rates iniprove thethroughput so greatly?The answer to tlie first cluestion is that \vitli a databasesize of more than 100 GB, it is not possible tocache t11c cntirc databasc. The cache improves thetransactions that are I-cad-intensive, but it does notentirely eliminate 1/0 contention.-KEY:BUS011 2 3 4 5 6MEMORY IN GBUTILIZATIONH B-CACHE MISS RATEM I-CACHE MlSS RATEW DATABASE CACHE MlSS RATEFigure 3Cache A/Iiss Ritcs and Bus UtilizationTo ansnrcr tlic second question, \\re need to look atthe AlpIiaServer 8400 s!!stem's hard\\rare counters thatmeasure instructio~i-cache (I-caclie) miss rate, hoardcache(B-caclic) miss rate, ancl the band\vidtli used onthe multiprocessor bus. Wit11 an increase in througlipi~tand niemolj! size, tlie VLIM system is spanning a largerdata space, and the bus utilization increases horn 24pcrccnt to 32 percent. Intuitively, one might tliinl< this\vould result in less opti~nal instr~~ction-and d~t'l-st~-e~rnlocality, thus increasing both miss rates. As sho\\>n inFigure 3, this provcd true for instruction stream misses(I-cache miss rate) but not true for tlic data stream, asI-epresented by the B-cachc miss ratc. Thc instructionstream rarely I-esi~lts in B-cache misses, so B-cachemisses can be attributed primarily to the dara stream.Performance analysis reqi~ires careful esaminationoftlie throughput of the system under test. 'The apparentparadox just I-elated can be resolved ifwe norm'liizethe statistics to the tlirouglipi~t acliie\~ed. Figure 4shon~s that tlie instruction-cache misses per transactiondcclincd slightly as tlic mclnory size \\[as increased fi-on11 GB to 6 GI?-and as t~-ansaction throi~ghp~~t doi~bled.Further~iiore, the R-cache \\lorlts substant.ially betterwith more memory: misses declined by 2S on a pcrtransactionbasis. M%!J is this so?Analysis of the system monitor data for each runindicates that bringing the dara into nlemory helpedI-educe the 1/0 per second by 30 pel-cent. If the transactionis forced to \trait for I/O operations, it is doneas)!nchronously, and the databasc causes some otherthread to begin cxccuting. Without VL,IM, 12 pel-centof trarlsactions miss the database cache and thus stallfor J/O activity. VVitIi VLM, only 5 percent of tlietransactions miss tlie database caclie, and tlie time toperform each transaction is greatly reiluccd. Thus eachthread or process has a shorter transaction latency. Theshorter latency contributes to a 15-percent reductionin system contest s\\~itch rates. We attribute themeasured inipro\~e~iienthard\\,are miss rates pertransaction \\hen using VLbI to the improvement incontest s\vitching.The performance counters on the Alpha rnicroprocessor\\/ere used to collect the number of instructionsissued and the n~rmber of c!~cles." In Table 2,the relative i~istructions per transaction res~~lts are theratios of instructions issued per second divided by tlienumber of ne\\i-order transactions. (113 TPC-C, eachtransaction has a different code path and instructioncount; tlicrcfore the instructions per transactionamount is not tlie total number of ne\\r-order transactions.)-The relative difference bct\vcc~~ instl-uctionsper transaction for 1 GB of d~tabase memory versus6 GB of database rneliiory is the nieasurcd effect ofeliminating 30 percent of the I/O operations, satisfiingmore transactions from main memory, reducingcontext switches, and reducing loci< contention.Vol. 8 Ko. 3 1996 63