Job Submission

The queue running on Frank uses Torque (a PBS derivative) and Moab, and jobs are submitted to the queues using the qsub (or msub) command.

Interactive jobs

Short test jobs may be submitted to the queues in an interactive manner. When an interactive job is accepted by the queue program to run, the job submitter is logged into a compute node and presented with a command prompt from which they can issue job commands

As a simple example of using qsub, the following job is submitted to the batch queue to run interactively on one CPU core for 5 minutes of wallclock time, using the following command:

$ qsub -I -q batch -l nodes=1:ppn=1,walltime=5:00
qsub: waiting for job xxxxx.headnode0.frank.sam.pitt.edu to start
qsub: job xxxxx.headnode0.frank.sam.pitt.edu ready
 
-bash-3.2$ 
-bash-3.2$ hostname
-bash-3.2$ n27

The -I flag forces an interactive job submission. After the job starts (is ready), a command prompt is presented in which the user types the hostname command; the output from which indicates that the user's job has started on compute node n27.

Batch jobs

A more convenient way of running jobs on Frank is to write a job submission script, and submit this to a queue. When the requested resources become available in the queue, the script commands will run in the background.

For example, a simple job submission script looks like the following:

#!/bin/bash
#PBS -N example1
#PBS -o example1.out
#PBS -e example1.err
#PBS -l nodes=2:ppn=4
#PBS -l vmem=4GB
#PBS -l walltime=1:30:05
#PBS -q batch
 
cd $LOCAL
prun myjob.x > job.output
cp job.output $PBS_O_WORKDIR

The first 8 lines (excluding line 1) are PBS directives which you use to specify how much resources are required for the job to run, where is should run, and where to save the standard output & error.

The last 3 lines are the linux commands used to run the job. These are the same command that would be entered at the command prompt in an interactive session (such as that above).

The syntax used for requesting resources with the PBS directives is discussed in more detail in Section Job submission script examples.

The following Sections discuss how to request different resources (or how to run jobs of differing requirements) with PBS directives in a job submission script.

Running a parallel calculation

To run a parallel calculation over 48 cores, users could include the following line in their submission script:

#PBS -l nodes=1:ppn=48

to force the calculation to run on a single Magny Cours node, or else

#PBS -l nodes=4:ppn=12

which will run on a single Magny Cours node with 48 cores each, or four Westmere nodes with 12 cores each. It could also run on more than one Magny Cours node (perhaps running over 2 Magny Cours nodes with 24 cores each), if these resources become available before the two previous scenarios.

Running on a specific hardware type

Jobs submitted to the batch (default) queue may run on either the Magny Cours or Westmere nodes, unless a specific CPU type is explicitly requested.

For some user's jobs, this will reduce the queue time. However, some users may need to run on a specific hardware type; this can be enforced by appending the CPU type in the node request line of the submission script, e.g.,

#PBS -l nodes=1:ppn=12:westmere

or

#PBS -l nodes=1:ppn=24:magny_cours

Serial jobs

The hardware in frank is designed primarily for parallel calculations, but serial jobs are also permitted. An important caveat with serial jobs on frank is that they have to run on the Magny Cours nodes with the gigabit ethernet.

A request for serial job with the following line in the submission script:

#PBS -l nodes=1:ppn=1

will automatically direct the job to the Magny Cours nodes.

Shared-memory jobs

Shared memory jobs, such as those using openMP directives, should be submitted to only one node with the following line in the submission script:

#PBS -l nodes=1:ppn=10

Large shared-memory jobs

The most amount of memory housed in an individual node (i.e., shared memory) is 256 GB. The Magny Cours nodes have this memory installed.

A job that requires this large amount of memory may be requested by including the following two lines in the job submission script:

#PBS -l nodes=1:ppn=1
#PBS -l vmem=254GB

A job of this type may spend some time in the queue, as it must wait until all jobs complete on a node so that it can access all of the node's memory.

Short test jobs

Short jobs may be submitted to the test queue for testing/debugging purposes. These jobs must have a wallclock limit of 10 minutes or less, and request 16 CPU cores (over 2 nodes) or less. For example, the following lines in a job submission script could be used for testing:

#PBS -l nodes=2:ppn=8
#PBS -l walltime=10:00
#PBS -q test

There are always 16 CPU cores reserved for test/debug jobs