10.07.2015 Views

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Is Parallel Programming Hard, And, If So, What Can You Do About It?

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5.4. PARALLEL FASTPATH 635.4.4 Resource Allocator CachesThis section presents a simplified schematic of aparallel fixed-block-size memory allocator. Moredetailed descriptions may be found in the literature[MG92, MS93, BA01, MSK01] or in the Linuxkernel [Tor03].5.4.4.1 <strong>Parallel</strong> Resource Allocation ProblemThe basic problem facing a parallel memory allocatoris the tension between the need to provide extremelyfast memory allocation and freeing in thecommon case and the need to efficiently distributememory in face of unfavorable allocation and freeingpatterns.To see this tension, consider a straightforward applicationof data ownership to this problem — simplycarve up memory so that each CPU owns itsshare. For example, suppose that a system with twoCPUs has two gigabytes of memory (such as the onethat I am typing on right now). We could simplyassign each CPU one gigabyte of memory, and alloweach CPU to access its own private chunk ofmemory, without the need for locking and its complexitiesand overheads. Unfortunately, this simplescheme breaks down if an algorithm happens to haveCPU0allocateallofthememoryandCPU1thefreeit, as would happen in a simple producer-consumerworkload.The other extreme, code locking, suffers from excessivelock contention and overhead [MS93].5.4.4.2 <strong>Parallel</strong> Fastpath for Resource AllocationThe commonly used solution uses parallel fastpathwitheachCPUowningamodestcacheofblocks, andwith a large code-locked shared pool for additionalblocks. To prevent any given CPU from monopolizingthe memory blocks, we place a limit on thenumber of blocks that can be in each CPU’s cache.In a two-CPU system, the flow of memory blockswill be as shown in Figure 5.27: when a given CPUis trying to free a block when its pool is full, it sendsblocks to the global pool, and, similarly, when thatCPU is trying to allocate a block when its pool isempty, it retrieves blocks from the global pool.5.4.4.3 Data StructuresThe actual data structures for a “toy” implementationof allocator caches are shown in Figure 5.28.The “Global Pool” of Figure 5.27 is implementedOverflowEmptyCPU 0 Pool(Owned by CPU 0)Global Pool(Code Locked)Allocate/FreeEmptyOverflowCPU 1 Pool(Owned by CPU 1)Figure 5.27: Allocator Cache Schematicby globalmem of type structglobalmempool, andthe two CPU pools by the per-CPU variablepercpumem of type percpumempool. Both of thesedata structures have arrays of pointers to blocksin their pool fields, which are filled from indexzero upwards. Thus, if globalmem.pool[3] isNULL, then the remainder of the array from index4 up must also be NULL. The cur fields containthe index of the highest-numbered full elementof the pool array, or -1 if all elementsare empty. All elements from globalmem.pool[0]throughglobalmem.pool[globalmem.cur] mustbefull, and all the rest must be empty. 81 #define TARGET_POOL_SIZE 32 #define GLOBAL_POOL_SIZE 4034 struct globalmempool {5 spinlock_t mutex;6 int cur;7 struct memblock *pool[GLOBAL_POOL_SIZE];8 } globalmem;910 struct percpumempool {11 int cur;12 struct memblock *pool[2 * TARGET_POOL_SIZE];13 };1415 DEFINE_PER_THREAD(struct percpumempool, percpumem);Figure 5.28: Allocator-Cache Data StructuresThe operation of the pool data structures is illustratedby Figure 5.29, with the six boxes represent-8 Both pool sizes (TARGET_POOL_SIZE and GLOBAL_POOL_SIZE) are unrealistically small, but this small size makes iteasier to single-step the program in order to get a feel for itsoperation.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!