FAQs

I can not access the machines.

There is the firewall issue, see off campus access.

I can not access the machines, and I know it is not the firewall!

The server might be down. Check the current list of "Cluster Alerts" on the home page.

I can not login due to some password trouble.

If you can not login to this website (say, if you can't remember your password), then go to https://core.sam.pitt.edu/user and click "Request new password" tab and follow the instructions.

If you can not connect to the Frank cluster because of what looks like a password issue (the cluster responds with user/pass prompt, but won't let you in), then note that Frank is using the Pitt credentials for authentication, in other words those which you use for signing in at http://my.pitt.edu. You may try updating your Pitt password see if that solves the problem.

If you are still having login troubles, please let us know. If you can not login to CORE.SAM (this website) please send us a message here.

I have a problem/request. What should I do?

Good question! And this is the primary reason why this website exists. Here is a set of guidelines to help you make the best use of CORE.SAM:

  • Search the website with keywords, see if there is anything related.
  • Post your question to one of the Forums. Admins and other users are actively following the discussions, and you will get an answer.
  • If you think your problem is something that simply needs be acted upon by the admins, or you've been suggested to do so at a Forum discussion, create a support ticket.

The support ticket form has the following fields:

Title [required]
Use a descriptive title. Do not use titles such as "Help me! I need software!", but instead be specific "Install/Update Turbomole".
Vocabularies [optional]
This section helps categorize the content at large. Read the descriptions of the specific items to make correct selections.
Body [required]
Indicate briefly the problem in detail. For example, if you need an application to be installed specify the name, url (if applicable), and version information.
Priority [optional]
Defaults to normal. If you need an urgent resolution you may set this field to critical. Please use this sparingly. Note that, while we'll try our best an urgent resolution is not guaranteed.

Right after you click the "Save" button, the post will be saved on the website, administrators will be notified, and the request will appear as an item under the tickets list(visibility of which depends on how you set the "Access Control"). You will also get an email notification which includes a link to this post for follow ups. When an admin takes on the task and responds, s/he will do so by a reply to this post on Core, of which you will be notified again via email.

When I try to login my browser warns about certificate errors!

This is normal. When you login or visit user account pages on CORE.SAM the connection is encrypted in order to protect your password and other user information. Encryption means there is something sensitive going on, so the browser freaks out and tries to check if it can really "trust" the website. Unfortunately, it doesn't trust SAM sites by default, since we didn't purchase a "certificate" from a recognized "authority". There is really a neat machinery with how all this security stuff works, read online if you like. For now, please ignore the warnings, follow the instructions that your browser presents, and add an exception for CORE.SAM.

Most browsers allow you to make the exception permanent. This is what you should prefer on your personal computer so that repeated visits won't generate the warning, and when they do that will indeed indicate a potential security breach.

Why is my job not running?

If your job sits in the queue without running for a longer time than expected, it could be due to the following:

  1. The resources requested for the job are not available. In other words, there aren't enough nodes or cores free at the moment for the job to run.

  2. You may have run into one of the hard limits for the queue. There are limits on the number of jobs (and cores) a user and/or a group can have running at any one time. Please refer to the Queues and Resources documentation to find out what these limitations are.

  3. There is a system error and the queue system is misbehaving.

You can find out more using checkjob:

# Given a job with ID 1234
checkjob 1234
 
# Or you can get more verbose information via
checkjob -v -v 1234

There is a lot of information as output to this command, but often times a reason for the job not running is printed in one of the last lines. If in doubt, please don't hesitate to ask on the Forums or via a support ticket.

You can also get an estimate of when your jobs will start with the showstart command. If no time is given (or is says infinity, this could indicate that there is a problem with the queue system, or there is a system-wide reservation (if there is a scheduled downtime). Please check the front page for cluster alerts.

How can I acknowledge SAM support?

You may use the following phrase:

"This research was supported in part by the University of Pittsburgh Center for Simulation and Modeling through the supercomputing resources provided." And if you've received any significant assistance from any SAM member that you'd like to acknowledge, continue as, "We specifically acknowledge the assistance of [relevant staff members]."

Is there a short description of SAM facilities I can use in my proposal, research description, etc.?

We would be very pleased to find out about such a need, and help out with these write ups. You may use the site contact form or (if you are current user) ask on Forums to request more information. The About page also contains long and detailed description. Here is a short version (updated 07/02/2014):

"Computing Resources: Computational resources are available through the University's Center for Simulation and Modeling (SaM). The Center provides a state-of-the-art high performance computing (HPC) cluster for campus researchers. The cluster is comprised of 20 16-core Intel Ivy Bridge, 54 64-core AMD Interlagos, 106 16-core Intel Sandybridge, 51 12-core Intel Westmere, 110 8-core Intel Nehalem, and 23 48-core AMD Magny-Cours, compute nodes, totaling to 8068 computation-only CPU cores. The nodes have a range of 12GB to 256GB per node shared memory, and 1.5 PB of shared and scratch storage including a Panasas parallel filesystems. Four of the 12-core nodes have a total of 16 general purpose NVIDIA C2050 GPU accelerator cards. Six of the 12-core nodes have a total of 24 NVIDIA TITAN GPU accelerator cards. The nodes are clustered via a fast Infiniband low latency network fabric in order to enable efficient distributed parallel runs. The infrastructure is designed for future scaling via additional resources funded by national instrumentation grants, internal University funds, or faculty contributions from grants or start-up funds. The system is housed at the enterprise level, state-of-the-art facilities provided by the University's Network Operations Center, and it is connected with the rest of the campus via a high-bandwith fiber-optical gigabit network. SaM also employs full-time PhD level consultants whose expertise covers a wide range of areas in HPC. The consultants are responsible for preparing training and educational material, teaching, cluster user support and consulting, and focused software development and research support for various projects at Pitt. "