Levels of the Memory Hierarchy

Example: Direct Mapping° Cache size 8 Bytes = 2 3 Bytes, N = 3° Cache Block size = 1 Byte, M = 0° I = N-M = 3; i.e., 2 3 =8 blocks indexed 000 … 111° Let a program execution refers to addresses 22, 26, 22, 18, 16, 3, 16, 26, …Address ReferenceA (in decimal)Address ReferenceA (in binary)IndexTag22 10110 11026 11010 01030 1111011017 1000100116 100000003 0001101116 1000000020 101001001011111010001010PDE1C_9Cache referenceThe cache contents just after the references to 22, 26, 22,were doneCache Index000001010011100101110111VNNYNNNYNCache Tag1110Cache DataMem(11010)Mem(10110)PDE1C_10

Example: Cache with 2-Byte Blocks° Say, the cache size is 8 Bytes; 8 = 2 N , N=3° Block size = 2 Bytes, i.e., 2 M with M =1;° So, I = N – M = 3-1 = 2• 2 2 = 4 cache blocks numbered 0..3 (00..11 in binary)Decimal addressof ReferenceBinary addressof ReferenceIndex22 10110 1126 11010 0122 10110 1118 10010 0116 10000 003 00011 0116 10000 0026 11010 01Byte Offset00000100PDE1C_15Cache reference° A reference address has 3 parts:• Lower N bits: last M bits the Byte-offset, the rest the index• The remaining bits, as before, the tag° A cache block contains data of a reference address and associatedwith:• Cache tag: cache tag of the reference address• Valid bit: indicates whether the block has been initialised° Cache contents after 22, 26, 22 (, 18, 16, 3, 16, 26):Cache Index0001VNCache TagY1011NY 10Cache Data11 Memory(11011) Memory(11010)Memory(10111)Memory(10110)PDE1C_16

Cache reference° Cache contents after 22, 26, 22, 18, 16, (3, 16, 26, …):° 22, 26 are misses° Second reference to 22 is a hit° 18 in binary = 10010 and a miss and replacesMem(26)° 16 in binary = 10000, is a missCache Index0001VYCache TagY1011NY 10Memory(23)Cache Data10 Memory(17) Memory(16)10 27; Memory(19) 26; Memory(18)Memory(22)PDE1C_17Cache referenceCache contents after 22, 26, 22, 18, 16 3, 16, (26,… :° 3 in binary = 00011 and is a miss, replacesMemory(19)° 16 is a hitCache Index0001VYCache TagY1011NY 10Cache Data10 Memory(17) Memory(16)00 27; 19; Mem(3) 26; 18; Mem(2)Memory(23)Memory(22)PDE1C_18

Cache reference° Approximate Cache contents after 22, 26, 22, 18, 163, 16, 26 (… :° 26 is a miss• 26 in binary = 11010, so replaces Mem(2)Cache Index0001VYCache Tag1011Y1011NY 10Cache DataMemory(17) Memory(16)27; 19; 3; 27 26; 18; 2; 26Memory(23) Memory(22)PDE1C_19Block Size Trade-off° In general, larger block size take advantage of spatiallocality BUT:• Larger block size means larger miss penalty:- Takes longer time to fill up the block• If block size is too big relative to cache size, miss rate will go up- Too few cache blocks° In general, Average Access Time:• = Hit Time x (1 - Miss Rate) + Miss Penalty x Miss RateMissPenaltyMissRateExploits Spatial LocalityAverageAccessTimeFewer blocks:compromisestemporal localityIncreased Miss Penalty& Miss RateBlock SizeBlock SizeBlock SizePDE1C_20

Fully Associative° Fully Associative Cache• Store the data in any block (instead of using a specific,direct-mapped block) and• Do not bother with the cache index at all• An address reference, if in the cache, can be in any block• So an address reference needs to be compared with thetag of every cache block• When block size is 1 byte, then there are as many tags asthe number of blocks in the cachePDE1C_21Fully Associative Cache – example° Fully Associative Cache• Store the data in any block (instead of using a specific, directmappedblock) and do not bother with the cache index• Example: Cache Size = 32 Bytes, block size = 1, we need 32comparators operating in parallel, each of 32-bit• Example: Cache Size = 512 Bytes, block size = 1, we need 512comparators operating in parallel, each of 32-bit31Cache Tag (32 bits long)0XXXXCache TagValid BitCache DataX:::PDE1C_22

Q4: What happens on a write?PDE° Modifications of cache data must be passed ontomain memory updatesTwo approaches:° Write through—The information is written to boththe block in the cache and to the block in the lowerlevelmemory.° Write back—The information is written only to theblock in the cache. The modified cache block iswritten to main memory only when it is replaced.• is block clean or dirty? Use of a dirty bit per block: set if thecache block is modified.° Pros and Cons of each?• WT: replacement can be quicker as writes are done in real-timeon main memory as well• WB: no writes of repeated writes due to the dirty bit1C_25Recall: Levels of the Memory HierarchyCapacityAccess TimeCostPDECPU Registers100s Bytes

Virtual Memory: Introduction° Main memory (DRAM) acts as a ‘cache’ for thesecondary store (Disk)• The technique for doing this is called virtual memory° Two Main Reasons• An illusion of a (mostly) fast main memory that is as large as thedisk itself (as done in cache management)• To support efficient and safe sharing of main memory acrossmultiple programs- Note the granularity change: cache assumes a singleprogram accessing data, while in virtual memory manyprogrammes sharing main memory are considered• Total memory required by all programs may be much larger thanthe actual size of the main memory itself• But each program actively uses a fraction of its memoryrequirement at any given time- Just like cache holds the active portion of a given programPDE1C_27Virtual Memory: Requirements & Approach° The number of programs executing at the sametime cannot be known a priori• It varies dynamically and hence portions of main memorycannot simply be pre-allocated° While sharing is dynamic, programs should also beprotected from each other• If two programs share data (shared variable) then that must besupported as well° When a program is compiled, it is given its ownvirtual address space – a separate range ofmemory locations accessible only to this program° Virtual memory translates this address space intophysical addresses• This translation provides the necessary protection and thepermitted sharingPDE1C_28

Address TranslationV = {0, 1, . . . , n - 1} virtual address spaceM = {0, 1, . . . , m - 1} physical address spacen >> mMAP: V --> M U {0} address mapping functionMAP(a) = a' if data at virtual address a is present in physicaladdress a' and a' in M= 0 if data at virtual address a is not present in MaProcessoraName Space VAddr TransMechanism0a'missing item faultfaulthandlerMainMemorySecondaryMemoryphysical addressOp Sys performsthis transferPDE1C_31P.A.010247168Paging Organizationframe 017PhysicalMemory (8KB)1K1K1KAddrTransMAP0102431744Address Translation (assuming a hit)VA page no. offsetV.A.page 01311K1K1KVirtual Memory (32KB)unit ofmappingalso unit oftransfer fromvirtual tophysicalmemoryPage TableRegisterindexintopagetablePDEPage TableV AccessRights PA +table locatedin physicalmemoryphysicalmemoryaddressactually, concatenation1C_32

Page Table and Process state° Each program, on compilation, is given its ownpage table° Page Table itself resides in the main memory° Address of the start of the page table is in the pagetable register° Page table Register, together with other registers,specifies the state of that program – referred to asthe process state• If a program needs to relinquish CPU for another program (preemption),the process state needs to be stored first by theOperating System• Restoring the program execution involves Operating System reloadingthe state° Program Counter (PC):• One of the other registers• holds the address of next instruction to be executed.PDE1C_33Page Table – Continued° Indexed by virtual page number• That is, the page table has an entry for each virtual page.• How many page table entries the virtual memory of figurein #1C_32 will require?° So, no index is needed as in fully associative (FA) cache° But … there is a subtle difference with FA cache• For a given memory address, we extract the tag andsearch the entire cache (see #1C_22, 23)• The outcome is either an entry for the reference addressis found (hit) or not found° With Page table, an entry for a virtual page is always found(see above) – this does not mean it is a hit° Hit or miss (page fault) is decided by the V-bit in the entry° V bit = 1 means that the entry has the physical address wherethe desired virtual page resides (hit)° V bit = 0 means the entry points to the disk address where thedesired page can be fetched from (page fault)PDE1C_34

Page fault° If the V bit for a virtual page is 0, it is page fault: theentry contains the secondary memory address(disk address) where the page can be found• V bit = 0 implies a page faultPage TableV101101VP No.mainMemoryDISKPDE1C_35Page replacement° If a page fault occurs and if all pages in memory arein use, OS chooses a page to replace• By LRU Scheme (seen earlier)• If pages have been referred in the order: 10, 12, 9, 7,11, 10• If reference to page 8 results in a page fault needingreplacement- The LRU Page is page 12 which is replaced by page 8• If another reference also results in a page fault, then the LRUpage is ….° Difficult to Implement LRU scheme strictly• Impractical to record every time is page is referred to• So, use a reference bit in the page table for each virtual page• A reference bit is set to 1 whenever that page is referred to• OS periodically records the pages have been referred to, in thepast period and also re-sets the ref bits to 0• Qn: will the ref bit of a page with V-bit = 0 be ever be set to 1?PDE1C_36

Writes° Write-through scheme requires write buffers whichare useful only when write is fast° Writing a page in a disk requires many many clockcycles° So, only write-back scheme is used° A Dirty bit is kept (in the page table) for each page• It is set to 1 when the page is written° If the OS chooses to replace the page, the dirty bitwill indicate whether or not the page needs to bewritten to the disk before being replacedPDE1C_37Virtual Address and a CacheCPUVA PA missTranslationCachedatahitMainMemoryIt takes an extra memory access to translate VA to PAThis makes cache access very expensive, and this is the "innermostloop" that you want to go as fast as possibleASIDE: Why access cache with PA at all? VA caches have a problem!synonym / alias problem: two different virtual addresses map to samephysical address => two different cache entries holding data forthe same physical address!for update: must update all cache entries with samephysical address or memory becomes inconsistentdetermining this requires significant hardware, essentially anassociative lookup on the physical address tags to see if youhave multiple hits;PDE1C_38

TLBsA way to speed up translation is to use a special cache of recentlyused page table entries -- this has many names, but the mostfrequently used is Translation Lookaside Buffer or TLBVirtual Address Physical Address Dirty Ref Valid AccessTLB access time comparable to cache access time(much less than main memory access time)PDE1C_39Translation Look-Aside BuffersJust like any other cache, the TLB can be organized as fully associative,set associative, or direct mappedTLBs are usually small, typically not more than 128 - 256 entries even onhigh end machines. This permits fully associativelookup on these machines. Most mid-range machines use smalln-way set associative organizations.Translationwith a TLBPDECPUApproximate access timehitVATLB hitPA missTLBLookupCacheTranslation1/2 tmisshittdata20tMainMemory1C_40

Virtual Memory Example° A virtual memory system has a page size of 1024bytes, eight virtual pages, and four physical pages.Based on the contents of the Page Table shown in#1C-42, answer each of the following threequestions.• What is the main memory address for virtual address 4096?• What is the main memory address for virtual address 1024?• A fault occurs on page 0. Will any of the pages currently inmain memory need to be replaced to handle this page fault?Explain your answer.° Hints:• The virtual page i, 0 ≤ i ≤ 7, contains memory locations withaddress ranging from i1024 to (i+1)1024 - 1 (inclusive).• 2 10 = 1024.PDE1C_41The Page TableVirtual PageNo.V-Bit Other Bits +Access RightsAddress000 0 … 01001011100001 0 … 11101110010010 1 … 00011 0 … 00001001111100 1 … 01101 0 … 10100111001110 1 … 10111 0 … 01010001011PDE1C_42

Summary #1/ 4:° The Principle of Locality:• Program likely to access a relatively small portion of the addressspace at any instant of time.- Temporal Locality: Locality in Time- Spatial Locality: Locality in Space° Cache Design Space• Total size, block size, associativity• Replacement policy• Write policy (write-through, write-back)PDE1C_43Summary #2 / 4: The Cache Design Space° Several interacting dimensions• cache size• block size• associativity• write-through vs write-back° The optimal choice is acompromise• depends on access characteristics• depends on technology / cost° Simplicity often winsBadCache SizeAssociativityBlock SizeGoodFactor ALessFactor BMorePDE1C_44

Summary #3 / 4 : TLB, Virtual Memory° Caches, TLBs, Virtual Memory all understood byexamining how they deal with 4 questions: 1) Wherecan block be placed? 2) How is block found? 3) Whatblock is repalced on miss? 4) How are writeshandled?° Page tables map virtual address to physical address° TLBs are important for fast translation° TLB misses are significant in processor performancePDE1C_45Summary #4 / 4: Memory Hierarchy° Virtual memory was controversial at the time:can SW automatically manage 64KB across manyprograms?• 1000X DRAM growth removed the controversy° Today VM allows many processes to share singlememory without having to swap all processes to disk;VM protection is more important than memoryhierarchy° Today CPU time is a function of (operations, cachemisses) vs. just f(operations):• What does this mean to Compilers, Data structures, Algorithms?PDE1C_46

Levels of the Memory Hierarchy

Create successful ePaper yourself

Delete template?

Save as template?