13.07.2015 Views

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

Intel® 64 and IA-32 Architectures Optimization Reference Manual

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

USING PERFORMANCE MONITORING EVENTS• Some events, such as writebacks, may have non-deterministic behavior fordifferent runs. In such a case, only measurements collected in the same run yieldmeaningful ratio values.B.5.3Notes on Selected EventsThis section provides event-specific notes for interpreting performance events listedin Appendix A of the Intel® <strong>64</strong> <strong>and</strong> <strong>IA</strong>-<strong>32</strong> <strong>Architectures</strong> Software Developer’s <strong>Manual</strong>,Volume 3B.• L2_Reject_Cycles, event number 30H — This event counts the cycles duringwhich the L2 cache rejected new access requests.• L2_No_Request_Cycles, event number <strong>32</strong>H — This event counts cyclesduring which no requests from the L1 or prefetches to the L2 cache were issued.• Unhalted_Core_Cycles, event number 3C, unit mask 00H — This eventcounts the smallest unit of time recognized by an active core.In many operating systems, the idle task is implemented using HLT instruction.In such operating systems, clock ticks for the idle task are not counted. Atransition due to Enhanced Intel SpeedStep Technology may change theoperating frequency of a core. Therefore, using this event to initiate time-basedsampling can create artifacts.• Unhalted_Ref_Cycles, event number 3C, unit mask 01H — This eventguarantees a uniform interval for each cycle being counted. Specifically, countsincrement at bus clock cycles while the core is active. The cycles can beconverted to core clock domain by multiplying the bus ratio which sets the coreclock frequency.• Serial_Execution_Cycles, event number 3C, unit mask 02H — This eventcounts the bus cycles during which the core is actively executing code (nonhalted)while the other core in the physical processor is halted.• L1_Pref_Req, event number 4FH, unit mask 00H — This event counts thenumber of times the Data Cache Unit (DCU) requests to prefetch a data cacheline from the L2 cache. Requests can be rejected when the L2 cache is busy.Rejected requests are re-submitted.• DCU_Snoop_to_Share, event number 78H, unit mask 01H — This eventcounts the number of times the DCU is snooped for a cache line needed by theother core. The cache line is missing in the L1 instruction cache or data cache ofthe other core; or it is set for read-only, when the other core wants to write to it.These snoops are done through the DCU store port. Frequent DCU snoops mayconflict with stores to the DCU, <strong>and</strong> this may increase store latency <strong>and</strong> impactperformance.• Bus_Not_In_Use, event number 7DH, unit mask 00H — This event countsthe number of bus cycles for which the core does not have a transaction waitingfor completion on the bus.B-44

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!