30.01.2013 Views

TotalView Users Guide - CI Wiki

TotalView Users Guide - CI Wiki

TotalView Users Guide - CI Wiki

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Debugging IBM Cell Broadband Engine Programs<br />

SLURM is an open-source resource manager designed for Linux clusters<br />

of all sizes. It provides three key functions. First it allocates exclusive and/<br />

or non-exclusive access to resources (computer nodes) to users for some<br />

duration of time so they can perform work. Second, it provides a framework<br />

for starting, executing, and monitoring work (typically a parallel job)<br />

on a set of allocated nodes. Finally, it arbitrates conflicting requests for<br />

resources by managing a queue of pending work.<br />

SLURM is not a sophisticated batch system, but it does provide an Applications<br />

Programming Interface (API) for integration with external schedulers<br />

such as the Maui Scheduler. While other resource managers do exist,<br />

SLURM is unique in several respects:<br />

� Its source code is freely available under the GNU General Public<br />

License.<br />

� It is designed to operate in a heterogeneous cluster with up to thousands<br />

of nodes.<br />

� It is portable; written in C with a GNU autoconf configuration engine.<br />

While initially written for Linux, other UNIX-like operating systems<br />

should be easy porting targets. A plugin mechanism exists to support<br />

various interconnects, authentication mechanisms, schedulers, etc.<br />

� SLURM is highly tolerant of system failures, including failure of the node<br />

executing its control functions.<br />

� It is simple enough for the motivated end user to understand its source<br />

and add functionality.<br />

Debugging IBM Cell Broadband<br />

Engine Programs<br />

The IBM Cell Broadband Engine is a heterogeneous computer having a PPE<br />

(PowerPC Processor Element) and eight SPEs (Synergistic Processor Elements).<br />

Despite being a heterogeneous computer, the way you debug Cell<br />

programs is nearly identical to the way you use <strong>TotalView</strong> to debug programs<br />

running on other architectures. (See Figure 98 on page 139.)<br />

Of course, the way in which programs are loaded and execute mean there<br />

are a few differences. For example, when a context is created on an SPU<br />

(Synergistic Processor Unit), this context is not initialized; instead,<br />

resources are simply allocated. This empty context is visible, but there are<br />

no stack traces or source displays. There is no equivalent to this on other<br />

architectures that <strong>TotalView</strong> supports. At a later time, the PPU (PowerPC<br />

Processor Unit) will load an SPU image into this context and tell it to run.<br />

In all cases, when you focus on a PPU thread, only the PPU address space is<br />

visible. Similarly, when you focus on an SPU thread, the address space of<br />

that SPU thread is visible.<br />

138 Chapter 7: Setting Up Parallel Debugging Sessions

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!