25.06.2015 Views

Administering Platform LSF - SAS

Administering Platform LSF - SAS

Administering Platform LSF - SAS

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 6<br />

Managing Jobs<br />

Pending jobs<br />

Viewing pending<br />

reasons<br />

Suspended jobs<br />

A job remains pending until all conditions for its execution are met. Some of<br />

the conditions are:<br />

◆ Start time specified by the user when the job is submitted<br />

◆ Load conditions on qualified hosts<br />

◆ Dispatch windows during which the queue can dispatch and qualified<br />

hosts can accept jobs<br />

◆ Run windows during which jobs from the queue can run<br />

◆ Limits on the number of job slots configured for a queue, a host, or a user<br />

◆ Relative priority to other users and jobs<br />

◆ Availability of the specified resources<br />

◆ Job dependency and pre-execution conditions<br />

Use the bjobs -p command to display the reason why a job is pending.<br />

A job can be suspended at any time. A job can be suspended by its owner, by<br />

the <strong>LSF</strong> administrator, by the root user (superuser), or by <strong>LSF</strong>.<br />

After a job has been dispatched and started on a host, it can be suspended by<br />

<strong>LSF</strong>. When a job is running, <strong>LSF</strong> periodically checks the load level on the<br />

execution host. If any load index is beyond either its per-host or its per-queue<br />

suspending conditions, the lowest priority batch job on that host is suspended.<br />

If the load on the execution host or hosts becomes too high, batch jobs could<br />

be interfering among themselves or could be interfering with interactive jobs.<br />

In either case, some jobs should be suspended to maximize host performance<br />

or to guarantee interactive response time.<br />

<strong>LSF</strong> suspends jobs according to the priority of the job’s queue. When a host is<br />

busy, <strong>LSF</strong> suspends lower priority jobs first unless the scheduling policy<br />

associated with the job dictates otherwise.<br />

Jobs are also suspended by the system if the job queue has a run window and<br />

the current time goes outside the run window.<br />

A system-suspended job can later be resumed by <strong>LSF</strong> if the load condition on<br />

the execution hosts falls low enough or when the closed run window of the<br />

queue opens again.<br />

Viewing suspension reasons<br />

Use the bjobs -s command to display the reason why a job was suspended.<br />

WAIT state (chunk jobs)<br />

If you have configured chunk job queues, members of a chunk job that are<br />

waiting to run are displayed as WAIT by bjobs. Any jobs in WAIT status are<br />

included in the count of pending jobs by bqueues and busers, even though<br />

the entire chunk job has been dispatched and occupies a job slot. The bhosts<br />

command shows the single job slot occupied by the entire chunk job in the<br />

number of jobs shown in the NJOBS column.<br />

<strong>Administering</strong> <strong>Platform</strong> <strong>LSF</strong> 113

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!