13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

MULTICORE AND HYPER-THREADING TECHNOLOGYUser/Source Coding Rule <strong>32</strong>. (H impact, M generality) Minimize the sharing ofdata between threads that execute on different bus agents sharing a common bus.The situation of a platform consisting of multiple bus domains should also minimizedata sharing across bus domains.One technique to minimize sharing of data is to copy data to local stack variables if itis to be accessed repeatedly over an extended period. If necessary, results frommultiple threads can be combined later by writing them back to a shared memorylocation. This approach can also minimize time spent to synchronize access to shareddata.8.6.2.2 Batched Producer-Consumer ModelThe key benefit of a threaded producer-consumer design, shown in Figure 8-5, is tominimize bus traffic while sharing data between the producer <strong>and</strong> the consumerusing a shared second-level cache. On an Intel Core Duo processor <strong>and</strong> when thework buffers are small enough to fit within the first-level cache, re-ordering ofproducer <strong>and</strong> consumer tasks are necessary to achieve optimal performance. This isbecause fetching data from L2 to L1 is much faster than having a cache line in onecore invalidated <strong>and</strong> fetched from the bus.Figure 8-5 illustrates a batched producer-consumer model that can be used to overcomethe drawback of using small work buffers in a st<strong>and</strong>ard producer-consumermodel. In a batched producer-consumer model, each scheduling quanta batches twoor more producer tasks, each producer working on a designated buffer. The numberof tasks to batch is determined by the criteria that the total working set be greaterthan the first-level cache but smaller than the second-level cache.MainThreadP(1) P(2) P(3) P(4) P(5)P(6)P: producerC: consumerC(1)C(2)C(3)C(4)Figure 8-5. Batched Approach of Producer Consumer Model8-28

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!