Intel® Trace Analyzer User's Reference Guide
Intel® Trace Analyzer User's Reference Guide
Intel® Trace Analyzer User's Reference Guide
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Intel® <strong>Trace</strong> <strong>Analyzer</strong> <strong>User's</strong> <strong>Reference</strong> <strong>Guide</strong><br />
Chapter 10. Concepts<br />
This Chapter contains explanations of some abstract concepts of Intel® <strong>Trace</strong><br />
<strong>Analyzer</strong>.<br />
Level of Detail<br />
Tracing all available events over time can generate billions of events even for a<br />
moderate program runtime of a few minutes and a handful of CPUs. The sheer<br />
amount of data is a challenge for any analysis tool that has to cope with this data.<br />
This is even worse as in most cases the analysis tool cannot make use of the same<br />
system resources as the parallel computer on which the trace was generated.<br />
An aspect of this problem arises when generating graphical diagrams of the event<br />
data. Obviously, it is next to impossible to graphically display all the data. Firstly,<br />
it would take ages to do that. Secondly, it would depend on round-off errors in<br />
the scaling and on the order of the data traversal which events would actually<br />
make it to the screen without being erased by others. So it is clear that only<br />
representatives of the actual events are shown.<br />
A valid choice would be to paint only every 100th or 1000th event and to hope<br />
that the resulting diagram gives a valid impression of the data. But this approach<br />
has its problems, because the pattern selects the representatives can interfere<br />
with the patterns in the underlying data.<br />
The Intel <strong>Trace</strong> <strong>Analyzer</strong> uses a Level of Detail concept to solve this problem. The<br />
Event Timeline Chart (as the other timelines) calculates a hint for the analysis<br />
that describes a time span that can reasonably be painted and selected with the<br />
mouse. This hint is called Resolution. The resolution requested by the timeline<br />
takes into account the currently available screen space and the length of the<br />
current time interval. Hence a higher screen resolution or a wider timeline results<br />
in more data being displayed for the same time interval.<br />
The Intel <strong>Trace</strong> <strong>Analyzer</strong> then tries to find a near match for the requested<br />
resolution. The exact resolution depends on internals, which will not be discussed<br />
here.<br />
The Intel <strong>Trace</strong> <strong>Analyzer</strong> divides the requested time interval into slots of length<br />
resolution. After that, representatives for the function events, the messages and<br />
the collectives in these slots are chosen in a deterministic way. If a functions<br />
spans more than the given resolution it results in a larger slot.<br />
The representatives for function events are chosen as follows: for each slot and<br />
each process (or thread group respectively) there is only a single function event<br />
representing the function where the thread or group spent most of its time.<br />
The representatives for messages are chosen as follows: for each tuple (sender,<br />
receiver, sender slot, receiver slot) only one message is generated that carries<br />
averaged attributes. These attributes are averaged over all messages matching<br />
the tuple.<br />
The representatives for collective operations are chosen as follows: for each<br />
tuple (communicator, first slot) one collective operation is generated. So it can<br />
happen that an operation of type MPI_Gather is merged with an operation of type<br />
MPI_Bcast resulting in a merged operation with no particular type at all (mixed).<br />
To prevent misconceptions, emphasis is given to the fact that the merging of<br />
events only applies to the timelines and not to the profiles. The profiles always<br />
Document number: 318120-002 102