13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

GENERAL OPTIMIZATION GUIDELINEStight, short loops), the per-iteration cost of unpacking/packing tend to be smallerthan situations where the non-vectorizable code contain longer operation or manydependencies. This is because many iterations of short, tight loop can be in flight inthe execution core, so the per-iteration cost of packing <strong>and</strong> unpacking is onlypartially exposed <strong>and</strong> appear to cause very little performance degradation.Evaluation of the per-iteration cost of packing/unpacking should be carried out in amethodical manner over a selected number of test cases, where each case mayimplement some combination of the techniques discussed in this section. The periterationcost can be estimated by:• evaluating the average cycles to execute one iteration of the test case• evaluating the average cycles to execute one iteration of a base line loopsequence of non-vectorizable codeExample 3-26 shows the base line code sequence that can be used to estimate theaverage cost of a loop that executes non-vectorizable routines.Example 3-26. Base Line Code Sequence to Estimate Loop Overheadpush ebpmov ebp, espsub ebp, 4mov [ebp], edicall foomov [ebp], edicall foomov [ebp], edicall foomov [ebp], edicall fooadd ebp, 4pop ebpretThe average per-iteration cost of packing/unpacking can be derived from measuringthe execution times of a large number of iterations by:((Cycles to run TestCase) - (Cycles to run equivalent baseline sequence) ) / (Iteration count).3-45

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!