13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

OPTIMIZING CACHE USAGEExample 9-11. Memory Copy Using Hardware Prefetch <strong>and</strong> Bus Segmentation (Contd.)movntdq [edi+ecx+48],xmm3movntdq [edi+ecx+<strong>64</strong>],xmm4movntdq [edi+ecx+80],xmm5movntdq [edi+ecx+96],xmm6movntdq [edi+ecx+112],xmm7add ecx,128cmp ecx,BLOCK_SIZEjne cpy_loop}}add esi,ecxadd edi,ecxsub edx,ecxjnz main_loopsfence9.7.2.8 Performance Comparisons of Memory Copy RoutinesThe throughput of a large-region, memory copy routine depends on several factors:• Coding techniques that implements the memory copy task• Characteristics of the system bus (speed, peak b<strong>and</strong>width, overhead inread/write transaction protocols)• Microarchitecture of the processorA comparison of the two coding techniques discussed above <strong>and</strong> two un-optimizedtechniques is shown in Table 9-2.Processor, CPUIDSignature <strong>and</strong>FSB SpeedPentium Mprocessor,0x6Dn, 400Intel Core Solo<strong>and</strong> Intel CoreDuo processors,0x6En, 667Table 9-2. Relative Performance of Memory Copy RoutinesByte SequentialDWORDSequentialSW prefetch + 8byte streamingstore1.3X 1.2X 1.6X 2.5X3.3X 3.5X 2.1X 4.7X4KB-Block HWprefetch + 16byte streamingstores9-36

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!