For cluster management and job scheduling the cluster uses the batch scheduling software Slurm. Since there may be simultaneous users signed in to the head node to submit jobs, Slurm is used to manage job submission, job scheduling, and resource allocation.
Partitions:
The cluster has three partitions to which jobs can be submitted to:
Runtimes:
To check the runtimes for each of the partitions, run the command sinfo. This will list all available partitions and provide information regarding state and max runtimes.
Submitting a Job:
To submit a job to the Slurm scheduler, use the ‘sbatch’ command along with a job script:
Using the ‘sbatch’ command you can also submit the job script with specific resource request. The following command submits a job that will run on 2 nodes with 128 tasks on each node while using the specified job script:
To submit a job to a specific partition you can run the following command:
Difference between ‘sbatch’ and ‘srun’:
Use ‘sbatch’ to submit batch job scripts to be schedules and run later.
Use ‘srun’ to run tasks either interactively or within a batch job script to launch parallel tasks.
Checking Job Status:
To check the status of your submitted job, use the ‘squeue’ command:
To learn more about Slurm, visit the following links:
Comments
0 comments
Article is closed for comments.