Open Science Grid TeraGrid

Condor & Grid 

HCC Informal Condor Tutorial

Today’s Activities… 

• We will be providing a short tutorial on using Condor on 

both HCC and remote resources. 

– We will cover basic job submits (like a batch system!). 

– As well as some advanced techniques that makes 

Condor more useful for managing many, many jobs. 

– Hands on! I want to make sure everyone here gets a 

chance to run their own jobs if they want.


• Condor – Resource Scavenger 

– Think PBS/SGE 

– Has Grid Extensions (Condor-G) 

• Grid – Open Science Grid (OSG) 

– High Throughput Grid 

– Serial Jobs

Open Science Grid

Open Science Grid VS. TeraGrid 

Open Science Grid TeraGrid 

High Throughput (lots of jobs) High Performance (Big Jobs) 

Serial MPI / OpenMP 

Free Signup Restricted Signup 

Opportunistic Resource Allocation

OSG Ideal Workflow 

• Lots of independent jobs (no mpi/openmp) 

• < 24 hours 

• Portable program

Grid Workflow 

1. runAutodock Autodock -p protein -l Ligand 

Autodock 

Autodock 

Autodock 

2. Create Jobs 

Submit Host 

SRM Storage 

4. Save Output 

3. Send Jobs for Execution 

Firefly 

Wisconsin 

Red 

Output 

Output 

Output 

Output

Condor & Grid: Step 1 

• Submit file: 

universe = vanilla 

executable = /bin/hostname 

output = host.out 

error = host.err 

log = host.log 

queue


• Submit Job: 

• Check job: 

condor_submit host.condor 

[dweitzel@hcc-grid condortest]$ condor_q 

-- Submitter: hcc-grid.unl.edu : : hcc-grid.unl.edu 

ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 

45288.0 dweitzel 12/9 18:17 0+00:00:00 I 0 0.0 hostname

The Most Important Commands 

• Submit a job: 

– condor_submit 

• Check your job’s status: 

– condor_q 

• Remove your jobs: 

– condor_rm

Using The Grid 

• Condor is run as a batch system on Prairiefire. 

– But it doesn’t require running on a cluster. 

– It doesn’t even require we run on nodes Nebraska 

owns! 

• We use a system called “glideinWMS” which submits jobs 

to remote clusters. 

– Each job submitted is actually a condor worker node! 

– The job starts on the remote node, launches Condor, 

and joins our cluster. 

– This way, you learn barely any grid-specific things. 

Just use Condor!

Using The Grid 

• By using glideinWMS, we can capture a huge number of 

slots! The plot below shows the last 24 hrs of activity.


• More complicated submit 


executable = /usr/bin/wc 

args = hosts 

output = wordcount.out 

error = wordcount.err 

log = wordcount.log 

should_transfer_files = YES 

when_to_transfer_output = ON_EXIT 

transfer_input_files = /etc/hosts 

queue


• Prepare for grid submission 

– Certificate - https://pki1.doegrids.org/ca/ 

• (We’ll help you out with this afterward) 

– Initialize proxy: 

voms-proxy-init --voms hcc:/hcc 

– This will expire in 12 hours. If you don’t have a 

current proxy, all commands will fail!


• Modify for Grid Submission. 

– No changes to the condor submit file. 

– Must submit from glidein.unl.edu 

• You can use your HCC account to log in here, but 

you must inform us first; we’ll then create a new 

home directory here. 

– Modify your job to not depend on the shared file 

system. 

• I.e., if you need a software package, you can’t install 

it yourself – you need our help.


• Condor will automatically transfer files back and forth. 

– This works pretty well for up to 100-200 MB per job. 

– No more than ~10 input and output files. Zip small 

files together! 



transfer_input_files = file1,file2 

transfer_output_files = file3, file4

Running many jobs 

• Need to run the same thing 100 times? No problem! 

– Or maybe change it slightly each time? 


executable = /usr/bin/wc 

args = hosts 

output = wordcount.out.$(Process) 

error = wordcount.err.$(Process) 

log = wordcount.log 



transfer_input_files = /etc/hosts 

queue 10

Connecting Jobs with DAG scripts 

Job A 

Job B 

Job C

Defining the DAG 

• You need to write a *.dag file, which will reference 

Condor submit files. This file describes the jobs to run and 

their dependencies. 

• The file below describes the linear-shape dependency 

graph on the previous slide. 

Job A a.sub 

Job B b.sub 

Job C c.sub 

Parent A Child B 

Parent B Child C

Submit & Run 

• Instead of using the “condor_submit” command, you want 

to use “condor_submit_dag”. 

• DAGMan can be used for job dependency graphs of up to 

100,000 jobs. 

– But let me recommend something smaller to start 

with…

Saving large outputs 

• Sometimes the output is too large to save with Condor. 

• In this case, you want to use a special protocol called SRM 

for file transfers. 

• The commands you will use start with lcg-* 

– “lcg-cp –b –D srmv2 ” 

• Does not work recursively. One file at a time. 

– “lcg-ls URL” 

– The URL you will use is: 

• srm://redsrm1.unl.edu:8443/srm/v2/server?SFN=/mnt/hadoo 

p/user///filename

Open Science Grid TeraGrid

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?