Integrating with the CLCbio Genomics Server

This page contains directions on how to connect your CLCbio Genomics Workbench to the CLCbio Genomics Server installation on Frank, allowing you to offload analyses to the cluster

Compatibility

We generally maintain two clcbio server clcbio.sam.pitt.edu and clcbio-stage.sam.pitt.edu. Currently clcbio.sam.pitt.edu runs CLC Genomics Server 9.1.1, and clcbio-stage.sam.pitt.edu runs CLC Genomics Server 9.1.1.

Biomedical Genomics Server Extension, CLC Genome Finishing Server Extension and CLC Microbial Genomics Server Extension are enabled on clcbio.sam.pitt.edu.

CLC Assembly Cell 5.0.3 is available on HTC cluster.

The following are the corresponding clients for the CLC Genomics Server 9.1.1

CLC Genomics Workbench 10.1.1

Biomedical Genomics Workbench 4.1.1

CLC Command Line Tools 4.1.1

We recommend running the corresponding versions of clients for CLC Genomics Server. However, CLC Genomics Workbench 10.0.0, 10.0.1 and 10.1, Biomedical Genomics Workbench 4.0 and 4.1, and CLC Command Line Tools 4.0 and 4.1 can connect to CLC Genomics Server 9.1.1. Tools that have changed between versions cannot be launched when using compatible, but not corresponding, client-server combinations.

Plugin notes

The Advanced RNA-Seq plugin has been retired. The tools from this plugin have been integrated into the software.

Server plugins (clcbio.sam.pitt.edu and clcbio-stage.sam.pitt.edu)

Additional Alignments

Annotate with GFF

Bisulfite Sequencing

HistoneChIP-Seq

Ingenuity Pathyway Analysis

Beta plugins (clcbio.sam.pitt.edu and clcbio-stage.sam.pitt.edu)

Advanced Peak Shape Tools

Transcript Discovery

Biomedical-enabled CLC Genomics Servers only (clcbio.sam.pitt.edu)

Ingenuity Variant Analysis

QIAGEN Gene Read Panel Analysis

Commercially available Server Extensions (clcbio.sam.pitt.edu)

CLC Genome Finishing Module

CLC Microbial Genomics Module

CLC workbench download link

CLC Genomics Workbench

Version: 10.1.1 - Release date: 22. Jun 2017

Download Mac OS X 10.7 or later - 220.9 MB (.dmg) http://download.clcbio.com/CLCGenomicsWorkbench/10.1.1/CLCGenomicsWorkbe...

Download Linux (RedHat/SuSE) installer - 64bit - 219.9 MB (.sh) http://download.clcbio.com/CLCGenomicsWorkbench/10.1.1/CLCGenomicsWorkbe...

Download Windows - 64bit - 170.1 MB (.exe) http://download.clcbio.com/CLCGenomicsWorkbench/10.1.1/CLCGenomicsWorkbe...

Biomedical Genomics Workbench

Version: 4.1.1 - Release date: 22. Jun 2017

Download Mac OS X 10.7 or later - 223.4 MB (.dmg) http://download.clcbio.com/BiomedicalGenomicsWorkbench/4.1.1/BiomedicalG...

Download Linux (RedHat/SuSE) installer - 64bit - 222.0 MB (.sh) http://download.clcbio.com/BiomedicalGenomicsWorkbench/4.1.1/BiomedicalG...

Download Windows - 64bit - 172.5 MB (.exe) http://download.clcbio.com/BiomedicalGenomicsWorkbench/4.1.1/BiomedicalG...


  1. Ensure you have the most up-to-date version of the CLCbio Genomics Workbench (the software should tell you if there's a more recent version when you start it, or you can check this page on the CLCbio website)

  2. If you have not already done so, request a user account/allocation on the Center for Simulation and Modeling (SAM) cluster by filling out the required information on this page

  3. If your computer is not connected to the Pitt network (e.g. you are working from home or on a trip), or you are working from a laptop that is connected to the Pitt wireless system, make sure you setup Pitt SSLVPN, so that you can communicate with the Center for Simulation and Modeling (SAM) cluster (clcbio servers are using HTC cluster)

  4. Start up the CLC Genomics Workbench

  5. If you have not done so already, install the CLC Workbench Client Plugin by clicking on the Plug-ins button () in the toolbar at the top of the CLC Genomics Workbench window. This will bring up the Manage Plug-ins and Resources dialog box. Find the CLC Workbench Client Plugin, click the Download and Install button, and then close the Manage Plug-ins and Resources dialog box and restart the CLC Genomics Workbench (choose Yes when the dialog box comes up that asks if you want to restart the workbench now)

     

  6. From the File menu, choose the "CLC Server Login" option. Click the triangle next to "Advanced", to find the server information section. The Server host is clcbio.sam.pitt.edu, and the Server port is 7777. Fill in your Pitt username and password, then check off the boxes to have this information saved, and to have the software automatically log in to the server (assuming the software you are using is on your own computer, and not a publicly accessible machine). Please note that username is case sensitive and all letters are in lowercase. Refer to the image below for an example of how the settings in this box should look:
       

  7. Your workbench software will now attempt to connect to the CLCbio Genomics Server installation on SaM cluster. One of the only noticeable changes will be the appearance of new folders in your Navigation Area. You can find one folder named CLC_Server_Data with a blue S on the folder icon:

     

    This is the data folder on SaM cluster, and inside it you will find folders corresponding to your group, which you should have access to (the name convention is first letter of first name + last name of the faculty):

    This folder is your group's working directory. Copying files in the workbench from your local folders to the folders on the server will copy your data over to Frank (again, file permissions have been set to restrict access to your data to only those members of your group - if you need any special permissions, or if you do not find a folder matching your group, please open a support ticket on the SAM mainpage)

  8. Running an analysis on HTC cluster operates in much the same fashion as running an analysis on your own computer, however in the dialog box that opens (when you first select a tool to run), you will now see additional options:

    To run on HTC cluster, always select the "Grid" option (do not attempt to run analyses using the "CLC Server" option as, counterintuitively, these will fail). The drop-down menu under the "Grid" option allows you to select an appropriate grid present, to control how many cores are assigned to your job and how long the job will need to run:

    In our experience, most jobs do not require more than 24 hours to complete (really most of them finish in less than 4 hours). Aligning large exome data sets to a reference genome typically can be done using 24 cores in about 2 hours (even data sets with up to 100x coverage). Aligning whole genome data sets (especially those with high coverage) is best done with 48 cores, and will typically require something less than 24 hours (recent alignments of 100x whole genome data - nearly 1 billion reads - have been completed in 6 hours using 48 cores, and even larger data sets - 1.5 billion reads - completed in 15 hours using 48 cores). Note however that variant calling requires much more time than alignment (sometimes requiring almost twice as much time), but does not use as many cores. In our experience, variant calling for whole exome data sets typically takes on the order of 6 hours (using 6 cores), while variant calling for whole genome data sets takes more like 30 hours (using 6 cores). Minimizing the number of cores your jobs use, and the amount of time blocked off for your jobs is essential, as there are limited resources currently available to the CLC server.

    If you think your job requires a grid preset that is not currently available, please send Dr. Fangping Mu an email: fangping@pitt.edu

  9. Occasionally (such as when you are running an import tool), you will also see a dialog box asking you where your data is located:

    Your selection here will decide which folders can be searched for files in the subsequent steps of the tool. Import tools can be used to simultaneously convert data from FASTQ format (for example) to the CLCbio format and transfer the CLCbio format file to the server. We can assign each group (faculty) an import/export directory on mobydisk /mnt/mobydisk/groupshares/. Member of the group shared this import/export directory with read/write permissions.    

    Please open a support ticket on the SAM mainpage if you do not find a folder matching your group.

  10. Once you start a job running on HTC cluster, you will see the usual progress bars in the Process section of the Toolbox. When the job status is listed as "Running", you can close your Workbench software, and the job will continue running on the remote server. When you relaunch your workbench, it will again connect to the server (as long as you checked "Automatic login" above - otherwise you can manually log in again), and the status of your job will be updated.

  11. Working directory and Import/export directory are assigned on /mnt/mobydisk. Note that this /mnt/mobydisk is not backed up, so you will need to be diligent and back up to your own personal drives.

  12. At the moment, the CLCbio software does not provide fine control of data access at the individual user level.  The access permissions are enforced at the group level.  What this means is that if User_A and User_B are both within Group_Z, then both will have read/write access to data stored within the Group_Z directory.

  13. Each group from the schools of health sciences is assigned a group quota of 2TB on mobydisk. If your group requires more disk space on mobydisk, please refer to the purchase options (http://core.sam.pitt.edu/node/618)

  14. If you have any problems with this procedure, or your jobs will not execute, please either send an email to Dr. Fangping Mu (fangping@pitt.edu) or submit a SAM support ticket online (login required)