Queues, limits & defaults

Universally available queues

There are three queues available to all users on Frank, called batch, gpu and test.

The Section Job Submission provides details on how to submit to the queues described below.

batch

The batch queue is the default queue and is available to all Frank users. Jobs that run in the batch queue will run on either the AMD Magny Cours CPU nodes (both Infiniband and gigabit ethernet), and the Intel Westmere CPU nodes. Most users jobs will run well on either CPU, but jobs a specific CPU may be requested by referring to the Section Running on a specific hardware type.

gpu

The gpu queue runs on the four nodes that have NVIDIA Tesla C2050 (fermi) cards. Use of this queue is restricted to those users who have an allocation on these nodes. To request the use of this queue, please post a Support Ticket.

test

The test queue is intended for short jobs to verify programs and input.

Jobs submitted to the test queue will run on nodes with Intel Xeon (Nehalem) CPUs. There are 8 cores per node, and 2 nodes available in test. To test an application over more than 16 cores, please submit to the batch queue.

There is a maximum wallclock limit (see below) of 10 minutes on test.

Restricted access queues

There are several queues on Frank which are restricted to use by only select groups. These queues run on hardware that was carried from a previous cluster, Fermi, which was purchased using funds from these research groups.

If you are not a member of these Fermi research groups, you will probably not be able to submit a job to these queues. The error upon job submission will look something like the following:

$ qsub -I -q kohn
qsub: Unauthorized Request  MSG=group ACL is not satisfied: user user@login0.frank.sam.pitt.edu, queue kohn

jordan

The jordan queue runs on the Magny Cours nodes, and is only available to members of the Jordan research group.

mem48g

Jobs submitted to the mem48g queue will run on nodes with the Intel Xeon (Nehalem) CPUs. All eight of these nodes have 48 GiB of memory and 1 TiB of local disk installed. Please refer to the Section System Architecture for more details on these mem48g Nehalem nodes (n59-n66).

kohn

Jobs submitted to the kohn queue will run on nodes with the Intel Xeon (Harpertown) CPUs. The Harpertown nodes have a variable amount of memory and local disk installed. Please refer to the Section System Architecture for more details on these Harpertown nodes.

mem24g

Jobs submitted to the mem24g queue will run on nodes with the Intel Xeon (Nehalem) CPUs. All forty of these nodes have 24 GiB of memory and 1 TiB of local disk installed. Please refer to the Section System Architecture for more details on these mem24g Nehalem nodes (n113-n152).

one_day

Jobs submitted to the mem24g queue will run on nodes with the Intel Xeon (Nehalem) CPUs. There are 92 of these nodes altogether, and they have a variable amount of memory installed. Please refer to the Section System Architecture for more details on these one_day nodes (n67-n158).

westmere

Jobs submitted to the westmere queue will run on nodes with the Intel Xeon (Westmere) CPUs. There are 8 of these nodes, and all have 48 GiB installed. Please refer to the Section [System Architecture(http://core.sam.pitt.edu/node/602) for more details on these westmere nodes (n163-n170).

hugen2080

This queue is restricted to use by students of the HUGEN2080 class.

che3935

This queue is restricted to use by students of the CHE3935 class.

Queue limits

There are several limits on users jobs on Frank. Exceeding these limits will prevent jobs from starting in the queue.

Wallclock limit

There is a wallclock limit for each of the different hardware types in frank:

Node type Network Time/hours
AMD Magny Cours Infiniband 144
AMD Magny Cours Gigabit 144
NVIDIA Fermi Infiniband 144
Intel Westmere Infiniband 144
AMD Magny Cours (jordan) Gigabit 144

CPU core limit

(this Section is out of date)

There is also a limit to the total number of CPU cores that a single user can use at any one time. This limit is currently set to 128 cores on the batch queue and 200 cores on the jordan queue.

Memory limit

The maximum amount of memory that a job can use is given by the total memory available on the node. The limits per node type are as follows:

Node type Network Memory/GB
AMD Magny Cours Infiniband 128
AMD Magny Cours Gigabit 256
NVIDIA Fermi Infiniband 48
Intel Westmere Infiniband 48
AMD Magny Cours (jordan) Gigabit 64-256

For the NVIDIA Fermi nodes, the memory limit refers to the host memory, rather than the GPU device memory.

For the jordan AMD Magny Cours nodes, there are 2 nodes with 256 GB, 3 nodes with 128 GB and 3 nodes with 64 GB of RAM.

Queue defaults

Jobs submitted to the queues inherit a default value for some of the resources, unless otherwise specified in the job submission. These defaults are as follows:

Resource Value
Number of CPU cores 1
Wallclock time 10 minutes (00:10:00)
Memory per process 1.3 GB

NB If your jobs require more than these default resources, you must modify these value in the job submission script, or they might be removed prematurely by the queue program. Please see the Sections Job Submission and Job Submission Examples for more information on modifying queue default values.