TEST-AND-SET implements the Alpha version of a test and set operation using//the Load-Locked .. store-conditional instructions. The purpose of this//function is to check the value pointed to by spinlock-address and, i f the//value is 0, set it to 1 and return success (1) in RO. If either the spinlock//value is already 1 or the store-conditional failed, the value of the spinlock//remains unchanged and a failure status (0,2, or 3) is returned in RO./ ///The status returned in RO is one of the following:// 0 failure (spinlock was clear; still clear, store-conditional failed)// 1 success (spinlock was clear; now set)// 2 - failure (spinlock was set; still set, store-conditional failed)// 3 - failure (spinlock was set; still set)/ /#define TEST-AND-SET (spinlock-address) asm( "Ldl-1 $0,($16);" o r $0,1,$1;"""st 1-c $1,($16); ""st1 $0,1,$0; ""or $0,$1,$0 ",(spinlock-address));// BASIC-SPINLOCK-ACQUIRE implements the simple case of acquiring a spinlock. I f// the spinlock is already owned or the store-conditional fails, this function// spins until the spinlock is acquired. This function doesn't return until the// spinlock is acquired./ /#define BASIC~SPINLOCK~ACQUIRE(spin1ock~address)C Long status = 0; \\while (1) \C \i f (*(spinlock-address) == 0) \C \status = TEST-AND-SET (spinlock-address); \i f (status == 1) \C \MB; \break; \\\\1Figure 1(:ode Scclucnccs for 1,ocl;ing Intrinsicsinstruction-caclic miss rntc OF 10 to 12 pcrcclir canct'fccti\~cl!~ stall the
DATABASE CACHE SIZE IN GBFigure 2Uatahc~sc Cnclie Sizc \Jessus T~I-ougliputprocesses. For examplc, an 8-GI? system allows 6.6 GRto be used for tlie databasc c~che.Performance AnalysisWhy docs thc use of VLLM impro\~c pcrformancc by afactor of nearly 21 Using statistics within the database,we measured the database-cache hit ratio as memorywas added. Figure 3 sl~ows the direct correlationbcnvccn lnorc mclnory and dccrcascd database-cachemisses: as Inemor!! is added, the database-cache missratc dcclincs from 12 pcrccnt to 5 pcrccnt. This raiseshvo more c1i1cstions: (1) Why docs the database-cachemiss rate rem~in at 5 percent? and (2) Why does asmall chdnge in database-cache miss rates iniprove thethroughput so greatly?The answer to tlie first cluestion is that \vitli a databasesize of more than 100 GB, it is not possible tocache t11c cntirc databasc. The cache improves thetransactions that are I-cad-intensive, but it does notentirely eliminate 1/0 contention.-KEY:BUS011 2 3 4 5 6MEMORY IN GBUTILIZATIONH B-CACHE MISS RATEM I-CACHE MlSS RATEW DATABASE CACHE MlSS RATEFigure 3Cache A/Iiss Ritcs and Bus UtilizationTo ansnrcr tlic second question, \\re need to look atthe AlpIiaServer 8400 s!!stem's hard\\rare counters thatmeasure instructio~i-cache (I-caclie) miss rate, hoardcache(B-caclic) miss rate, ancl the band\vidtli used onthe multiprocessor bus. Wit11 an increase in througlipi~tand niemolj! size, tlie VLIM system is spanning a largerdata space, and the bus utilization increases horn 24pcrccnt to 32 percent. Intuitively, one might tliinl< this\vould result in less opti~nal instr~~ction-and d~t'l-st~-e~rnlocality, thus increasing both miss rates. As sho\\>n inFigure 3, this provcd true for instruction stream misses(I-cache miss rate) but not true for tlic data stream, asI-epresented by the B-cachc miss ratc. Thc instructionstream rarely I-esi~lts in B-cache misses, so B-cachemisses can be attributed primarily to the dara stream.Performance analysis reqi~ires careful esaminationoftlie throughput of the system under test. 'The apparentparadox just I-elated can be resolved ifwe norm'liizethe statistics to the tlirouglipi~t acliie\~ed. Figure 4shon~s that tlie instruction-cache misses per transactiondcclincd slightly as tlic mclnory size \\[as increased fi-on11 GB to 6 GI?-and as t~-ansaction throi~ghp~~t doi~bled.Further~iiore, the R-cache \\lorlts substant.ially betterwith more memory: misses declined by 2S on a pcrtransactionbasis. M%!J is this so?Analysis of the system monitor data for each runindicates that bringing the dara into nlemory helpedI-educe the 1/0 per second by 30 pel-cent. If the transactionis forced to \trait for I/O operations, it is doneas)!nchronously, and the databasc causes some otherthread to begin cxccuting. Without VL,IM, 12 pel-centof trarlsactions miss the database cache and thus stallfor J/O activity. VVitIi VLM, only 5 percent of tlietransactions miss tlie database caclie, and tlie time toperform each transaction is greatly reiluccd. Thus eachthread or process has a shorter transaction latency. Theshorter latency contributes to a 15-percent reductionin system contest s\\~itch rates. We attribute themeasured inipro\~e~iienthard\\,are miss rates pertransaction \\hen using VLbI to the improvement incontest s\vitching.The performance counters on the Alpha rnicroprocessor\\/ere used to collect the number of instructionsissued and the n~rmber of c!~cles." In Table 2,the relative i~istructions per transaction res~~lts are theratios of instructions issued per second divided by tlienumber of ne\\i-order transactions. (113 TPC-C, eachtransaction has a different code path and instructioncount; tlicrcfore the instructions per transactionamount is not tlie total number of ne\\r-order transactions.)-The relative difference bct\vcc~~ instl-uctionsper transaction for 1 GB of d~tabase memory versus6 GB of database rneliiory is the nieasurcd effect ofeliminating 30 percent of the I/O operations, satisfiingmore transactions from main memory, reducingcontext switches, and reducing loci< contention.Vol. 8 Ko. 3 1996 63
- Page 1:
IINTERNET PROTOCOL V.6DigitalTechni
- Page 6 and 7:
lie! elements of the protocol,Digit
- Page 8 and 9:
Intcrnct. Within tlic IETt', severa
- Page 10 and 11:
~~scd to store Iicccssar!z ciata an
- Page 12 and 13:
packets, \\,hilt the latter avoids
- Page 14 and 15: * Test address for IPv4 characteris
- Page 16 and 17: ROUTER SOLICITATIONTYPECODECHECKSUF
- Page 18 and 19: 7 .ncn\.orlt. The solution is to al
- Page 20 and 21: AUTOCONFIGURATIONPROCESSINGUSER SPA
- Page 22 and 23: pilssing tlic olx~i x)ckcts to them
- Page 24 and 25: James P. BoundJIII~ Bol~nd 15 ,I co
- Page 26 and 27: process of restoring a p;~rtic~~I~r
- Page 28 and 29: Table 2 (continued)Year Item Descri
- Page 30 and 31: Table 3Goals of the Australian Digi
- Page 32 and 33: fidelity, tlic tcst of \\~liicli is
- Page 34 and 35: An important consideration is that
- Page 36 and 37: Table 8Architectures Implemented by
- Page 38 and 39: ucoder~ novaNOVA simulator V2.2bsim
- Page 40 and 41: BiographiesMaxwell M. BurnetMax Bur
- Page 42 and 43: no\\. tli;it appropriate stantlards
- Page 44: High Perfor~nance Fortran V1.l is c
- Page 47 and 48: 5. For ca;i~i~l~lc scc /'I.OC.CY'L/
- Page 49 and 50: tllc s\,stcIn under test to use the
- Page 51 and 52: WAREHOUSEW.89,0.000089'WDISTRICTW'1
- Page 53 and 54: III 8-CPU. 8-GB 8-CPU, 8-GB II -II
- Page 55 and 56: ~ILI~LIC. In othcr \\.orcis, the gr
- Page 57 and 58: ~norc to ~ I ~ S L Ithe I . ~ ~.cj>
- Page 59 and 60: 13. 1)igital Eqi~ip~ncnt Corporatio
- Page 61 and 62: TPC-C Benchmark7 -.I. lie TPC-C bcn
- Page 63: Lock OptimizationL,ocl
- Page 67 and 68: Engineering Group); Marl< Davis and
- Page 69 and 70: H\,pcrtc\-t Transfer I'rotocol (HTT
- Page 71 and 72: Grc~pliical objccts such as definit
- Page 73 and 74: select box part of the toollcit pro
- Page 75 and 76: work on it was limited, \ve divided
- Page 77 and 78: Further ReadingsThc Digital Technic
- Page 79 and 80: J. Iroccssor," PI-ocec~di7rg.i ?/'l
- Page 81 and 82: B. Lce, E. Atnkov, and J. ClementM.
- Page 83: Printcd in U.S.A. EC-N7285-18/9612