13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

GENERAL OPTIMIZATION GUIDELINES3.5.4.3 Stack <strong>Optimization</strong>In Example 3-22, an input parameter was copied in turn onto the stack <strong>and</strong> passedto the non-vectorizable routine for processing. The parameter passing from consecutivememory locations can be simplified by a technique shown in Example 3-25.Example 3-25. Stack <strong>Optimization</strong> Technique to Simplify Parameter Passingcall foomov [ebp+16], eaxadd ebp, 4call foomov [ebp+16], eaxadd ebp, 4call foomov [ebp+16], eaxadd ebp, 4call fooStack <strong>Optimization</strong> can only be used when:• The serial operations are function calls. The function “foo” is declared as: INTFOO(INT A). The parameter is passed on the stack.• The order of operation on the components is from last to first.Note the call to FOO <strong>and</strong> the advance of EDP when passing the vector elements toFOO one by one from last to first.3.5.4.4 Tuning ConsiderationsTuning considerations for situations represented by looping of Example 3-22 include• Applying one of more of the following combinations:— choose an alternate packing technique— consider a technique to simply result-passing— consider the stack optimization technique to simplify parameter passing• Minimizing the average number of cycles to execute one iteration of the loop• Minimizing the per-iteration cost of the unpacking <strong>and</strong> packing operationsThe speed improvement by using the techniques discussed in this section will vary,depending on the choice of combinations implemented <strong>and</strong> characteristics of thenon-vectorizable routine. For example, if the routine “foo” is short (representative of3-44

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!