The Batch System

When you login to Frank, the interactive environment you are presented with is the so-called "Login Node". The login node is for editing codes, transferring files, building/compiling applications, and submitting jobs to the queue system. It is commonly utilized by multiple simultaneous users at any given time. Therefore, CPU intensive tasks, productions runs, unit tests, debugging, etc. should not be done on the login node directly, but instead on the compute nodes through the batch queue system. The batch queue system allows for submission of job scripts (batch scripts) for unattended runs, as well as interactive sessions. This is described here.

When you login appropriate queue software should be loaded into your environment. If batch commands are not working please try running the following command:

module load queue

Torque, Moab, and Gold are the cluster software that constitute the batch queue system. These are utilized for resource management, scheduling, and accounting, respectively. See their respective pages for more details: Torque Resource Manager, MOAB Workload Manager, GOLD.

The Frank Queues

  • In order to use either the Frank queues, groups must have an active service unit allocation as defined in our Service Unit Policies. Groups that have been granted an initial yearly allocation of 10,000 service units are considered Non-investors.
  • To query that state of the queues, please read the MOAB Scheduler page.
  • Refer to the Job Submission page for submission examples.
  • Please read the Scratch disk page for more information about local disk usage with the ddisk flag

Shared Memory

  • All jobs in the Shared Memory queues are limited to 6 days (144 hours) of walltime.

If you wish to run a serial or parallel program entirely within one node, submit the job on the commands line as follows

qsub -q <queue> -l nodes=1:ppn=<cores> -l mem=<memory>gb -l ddisk=<scratch>gb

or in a submission script as

#PBS -q <queue>
#PBS -l nodes=1:ppn=<cores>
#PBS -l mem=<memory>gb
#PBS -l ddisk=<scratch>gb

All jobs in the Shared Memory queues are assigned the default memory-per-core listed above. If your job requires more memory use the memory keyword in your job submission, where <memory> can be up to the maximum allowed value given above. The mem keyword specifies the total memory required for all cores, not the value per core.

Queue Hardware description Memory limitations Local scratch disk limitations Maximum cores per user SU scale
shared 48-core AMD Magny cour
1 TB local disk
2.6 GB/core default
32 GB maximum memory
1 GB default
150 GB maximum
Standard user: 48
Investor: 96
0.5
shared_large 16-core Intel Sandy Bridge
2 TB local disk
4 GB/core default
63 GB maximum memory
113 GB default
1811 GB maximum
Standard user: 16
Investor: 96
1.5
shared_heavy 16-core Intel Sandy Bridge
3 TB local disk
8 GB/core default
126 GB maximum memory
171 GB default
2738 GB maximum
16 cores per jobs
2 simultaneous jobs
1.5
  • Only one job will run per node on the shared_heavy queue
  • The shared_heavy queue is restricted to users who will require all of the resources. Users should submit a request for access with a support ticket

Distributed Memory

  • All jobs in the Distributed Memory queues are limited to 6 days (144 hours) of walltime.

If you wish to run a distributed memory parallel program, submit the job on the command line as follows

qsub -q <queue> -l nodes=<nodes>:ppn=<cores>

or in a submission script

#PBS -q <queue>
#PBS -l nodes=<nodes>:ppn=<cores>

nodes=1 does not imply a distributed memory parallel calculation and such calculations should be run in a shared memory queue.

Queue Hardware description ppn Memory limitations Local scratch size Minimum nodes per job Maximum cores per user SU scale
distributed Intel Westmere
QDR InfiniBand
12 4 GB/core 905 GB 2 Standard user: 72
Investor: 144
1.0
idist_short Intel Westmere
QDR InfiniBand
12 4 GB/core 905 GB 1 Investor: 72 for 12 hours 1.0
dist_small Intel Sandy Bridge
QDR InfiniBand
16 2 GB/core 905 GB 4 Standard user: 64
Investor: 288
1.5
dist_fast Intel Sandy Bridge
FDR InfiniBand
16 8 GB/core 905 GB 4 192 1.5
dist_ivy Intel Ivy Bridge
FDR InfiniBand
16 4 GB/core 905 GB 4 160 1.5
  • Important: Please do not submit jobs with with ppn less than the number of cores per node.
  • If your job requires more than the posted GB/core, you must still submit with the full ppn value to prevent other jobs from running. For MPI programs you can change the number of running processes by using prun -npernode.

Mixed Usage

These queues allow single-core, single-node and multi-node jobs to run simultaneously. There are no restrictions on the minimum number of cores that can be requested. The maximum walltime is 6 days.

Queue Hardware description ppn Memory limitations Local scratch size Maximum cores per user SU scale
dist_big Intel Sandy Bridge
QDR InfiniBand
16 4 GB/core 905 GB Standard user: 64
Investor: 288
1.5
  • If your job requires more than the posted GB/core, you must still submit with the full ppn value to prevent other jobs from running. For MPI programs you can change the number of running processes by using prun -npernode.

GPU queue

The gpu queue runs on the 10 nodes that have NVIDIA GPGPUs. All GPU nodes have 4 cards each.

Jobs in the gpu queue are divided into two categories, long and short.

  • Each user will be allowed to run up to 8 short jobs. When the demand for the gpu resources is low, a user can run up to 16 jobs.

  • Each user will be allowed to run up to 2 long jobs.

There are three GPU models installed on Frank nodes and are accessible with -l feature=<model>.

Model Total number of cards Memory (GB)
c2050 RETIRED 12 2
c2075 RETIRED 4 6
titan 24 6
qsub -q gpu -l nodes=1:ppn=x:gpus=y,feature=<model>

where x<=12 and y<=4. If gpus=y is not included the Moab will not schedule job and it may be canceled. Please use this resource wisely in regards to the number of cores and GPUs requested.

  • Access to the Titan cards is restricted.

Test queue

The test queue is intended for short jobs to verify programs and input.

Jobs submitted to the test queue will run on nodes with Intel Xeon (Nehalem) CPUs. There are 8 cores per node, and 4 nodes available in test. Each node has a total of 12 GB of RAM and 230 GB of local scratch disk.

There is a maximum wallclock limit of 2 hours and a maximum of 16 cores (nodes=2:ppn=8) per job.

Development queue

The dev is intended for debugging and code development runs and is limited to 15 minutes. The dev queue can use any of the following nodes by selecting the feature in the queue submission. CPU usage on this queue is not charged to your Service Unit account.

qsub -q dev -l nodes=1:ppn=<cores>:<feature>
Node type Feature Maximum cores / job
AMD Magny Cours magny_cours 48
Intel Westmere westmere 12
Intel Sandy Bridge sandybridge 16
Intel Ivy Bridge ivybridge 16
GPU c2050 or c2075 ppn=12:gpus=4
  • If no feature is requested the job will run on the first available node within the node types listed above.

  • If the number of cores (ppn) exceeds the number available for the chosen node type the job will be canceled.

  • If the requested node type is not one of the four listed above the job will never run.

Quality of Service

In Moab all jobs have a priority when they are submitted and it is job with the highest priority that runs next. The initial priority of a job is determined by which QoS level the job has been granted. Run myQOS to determine your QoS access. You will need to have the sys environment module loaded first.

By default all jobs are run under the standard QoS. The initial job priority for the standard QoS is 10000.

Jobs submitted by groups that have invested in SaM will automatically be run under the investor QoS. The initial job priority for the investor QoS is 100000.

Jobs that request the low quality of service will be charged at a rate of 1/4 of the published charge factors per CPU type for jobs that are run when the load on the cluster is very low. Jobs within the quality of service are not guaranteed to make a reservation and will be pushed down the queue by any standard priority job. As such, showstart can not be relied upon to provide an accurate estimate of the time the job will start The initial priority for a job submitted with the low QoS is 1000.

The low quality of service can be used with any queue by including the following flag to the qsub command or submission script

-l qos=low

Job Priorities

An initial static priority is applied to every job submitted to the queue based on the user's investment status. The total priority of a job is determined by the applied QoS, the number CPUs requested, the amount of time a job has been waiting in the Idle state and the XFactor. Each priority element is given a weight and the total priority is calculated as

Priority = Initial
 + 200 * number of CPUs
 + 100 * XFactor
 +  10 * minutes in Idle state
  • In Moab the XFactor is calculated as 1 + <Queued Time> / <Requested Wall time>

This priority function favors the following jobs most often.

  1. Jobs submitted by investors

  2. Jobs that have waited the longest.

  3. Jobs that request a large number of cores.

  4. Jobs that have a XFactor, essentially meaning jobs that do not request 6 days.

Use checkjob -v -v to determine how your job's priority has been calculated. In the example below the job was submitted by an investor, requested 8 CPUs and has been waiting 1249.4 minutes. This job requested 12 hours of walltime, which means it XFactor is 2.7. The total priority is 114364 and has been computed as

$> checkjob -v -v <jobID>
Job                    PRIORITY*   Cred( User:  QOS)  Serv(QTime:XFctr)   Res( Proc)
             Weights   --------       1(    1:    1)     1(   10:  100)     1(  200)
<jobID>                  114364    87.4(  0.0:10000)  11.2(1249.:  2.7)   1.4(  8.0)

Fermi queues

Access to the Fermi queues is restricted. Please see the following page for more information.

Queue status

The command frank-avail can be used to determine the amount of available resources in any queue on Frank. Each line of output indicated the window over which the listed resources are available.

For example, the output below indicates that a job requesting 2 nodes with a walltime of less than 2 days will have to wait in the queue for approximately 9 hours. Note that for SharedMem queues frank-avail cannot determine the largest number of cores-per-node only the total number of available cores and nodes.

>frank-avail dist_big
          Cores           Nodes       Wall time       Wait time       StartDate
          -----           -----    ------------    ------------  --------------
             16               1         2:59:27         5:56:53  17:19:39_02/27
             32               2      1:20:40:52         8:56:20  20:19:06_02/27
             48               3        20:29:44      2:05:37:12  16:59:58_03/01
            112               7         1:14:38      3:02:06:56  13:29:42_03/02
            128               8         4:20:57      3:03:21:34  14:44:20_03/02
            144               9         8:56:24      3:07:42:31  19:05:17_03/02
            160              10         9:19:05      3:16:38:55  04:01:41_03/03
            192              12        00:03:20      4:01:58:00  13:20:46_03/03
            256              16        00:13:09      4:02:01:20  13:24:06_03/03
            288              18         3:08:11      4:02:14:29  13:37:15_03/03
            384              24         5:12:33      4:05:22:40  16:45:26_03/03
            480              30         4:14:25      4:10:35:13  21:57:59_03/03
            576              36        INFINITY      4:14:49:38  02:12:24_03/04

Jobs

A job may be queued for several reasons:

  1. A technical error in the submission is preventing scheduling
  2. Your throttling limits have been reached
  3. There are no feasible processors available at this time

In order to determine why your job has been queued, please see the output of

checkjob <job-id> [-v -v]

Note: Many pages of output will be printed when using -v -v

If MOAB determines that a job will never run, the job will be canceled. In that case checkjob can be used to determine why the job was canceled for up to one day.

Please read the MOAB scheduler page for more details.