Tutorial: Add Cluster support
The Epos framework supports the direct integration of compute clusters based on the Sun Grid Engine. Integrating a remote cluster allows you to submit all jobs directly to the cluster and fetch results later. For this to work, you need SSH access to a remote machine that is part of the cluster. The machine must be a Submit Host, the SGE commands qsub, qdel and qstat must be installed. If you already used your cluster, this is typically the machine that you use to submit your jobs.
Here we will go through the steps of configuring a remote machine to be used as an execution location in Epos, which allows you to not only run singel instances of your jobs remotly, removing load from your local machine, but also submit MPI jobs, i.e. RaxML or MrBayes. You will also be able to shut down Epos on you local machine without interrupting job executions. All the results can be fetched later. Also note that you can also use the Epos UI to configure your cluster and than then use the configured cluster from within Epos Scripts, for example, to submit a set of jobs without loading the data into your Epos workspace.
Let us quickly go through the steps to configure the remote cluster in Epos. The jobs and location management can be found on the main toolbars right side or in the main menu under Jobs. You will see two tabs, one that lists the jobs, and one that lists all configured locations. Click on the Locations tab and then on the plus button in the lower left corner.
This opens the “New Location” Wizard. You have to fill out the form to integrate your cluster. Specify a unique name that identifies your cluster, the remote host address and ssh port, your user name and password and two folders: Remote Folder is the location where Epos stores remote job data, Installation Folder is the directory that we use to install executables and tools for the cluster. Finally also check the update interval and the cluster architecture.
After you click Next, Epos connects to the remote machine and tries to figure out the locations for the SGE tools. There is a good chance that Epos can automatically detect the right locations, but in case one of the paths can not be identified correctly, you will have to specify them manually. These path are the most crucial part of the cluster configuration. They point to the executables Epos uses when submitting or deleting jobs or checking status. If the automatic detection did not work, you can also start by specifying just the SGE_ROOT location and hitting Validate. Epos will then try to figure out the paths to the commands based on the specified SGE_ROOT.
The next section allows you to configure the locations of remote tools. For example, to configure ClustalW, open the ClustalW tab and specify the full path to the ClustalW executable. You can also use the Auto-Install function for one or all tools, but note that typically you want to specify paths to local installations that already exist.
Note that the latter is especially important if you want to submit MPI jobs, i.e. RaxML or MrBayes. Specify the path to the clusters mpi version of the tools because the Auto-Install feature does not come with MPI versions for the tools. The last tool in the list is Epos itself. You need a copy of Epos on the cluster to submit algorithms that are not based on external tools, i.e. the Neighbor Joining implementation.
Finally you have to adapt the SGE execution scripts. Three scripts are available by default: Epos, External Tool and MPI. The first script is used to submit algorithms that run in Epos and do not use an external tool. In contrast, the External Tool script runs jobs based on external tools while the MPI script is used to submit MPI jobs. You can use SGE paramters within the sript template. For example, if you want to specify a specific queue, say all.q, to be use for your job, add
#$ -q all.q
in the script. This is even more important for MPI jobs. You probably have to make sure that a proper version of mpirun is used and that the MPI libraries can be found. When you use the SGE’s tight integration, you probably also want to specify
#$ -pe mpi 24
where mpi is the name of the your parallel environment and 24 is the number of slots you are requesting.
When everything is configured, hit Finish and your remote location is created. Now you can start algorithms and change the location you want to submit jobs to in the lower right corner of the algorithm window. If you select a remote location, the submission dialog will ask you for name of the remote jobs and let you select and customize the sge script you want to use for execution.