General Information
Who can access the HPC cluster?
Access to the HPC cluster is typically granted to researchers, faculty, and students who require high computational resources for their projects.
How do I request access to the HPC cluster?
To request access, please submit a request via ServiceNOW or email at@sfsu.edu. You may need to provide details about your project and computational needs.
What are the specifications of the HPC cluster?
The HPC cluster consists of multiple partitions with different specifications:
- cputest: 1 node, 2 X AMD EPYC 9534, 128 cores
- cpucluster: 2 nodes, 2 X AMD EPYC 9534, 256 cores
- gpucluster: 1 node, 2 X AMD EPYC 9334, 64 cores, 4 X NVIDIA A100 (80GB)
- login: 1 node, 1 X AMD EPYC 9124, 16 cores
Access and Usage
How do I log in to the HPC cluster?
To get instructions on how to sign in to the cluster please use the following guide we created: Accessing the HPC cluster.
What software is available on the HPC cluster?
Currently, the installed software includes OpenMPI, Slurm, and CUDA v12.4. To request software installations on the cluster please submit a request via ServiceNOW or email us at at@sfsu.edu
How do I submit a job using Slurm?
To get more information on how to use and submit jobs using Slurm please follow the link to our guide: Job Submission using Slurm.
How do I transfer files to and from the HPC cluster?
You can transfer files you can use the command scp. For example, to copy a file to the cluster:
To copy a file from the cluster:
Technical Details
How do I check what partitions are available and what their status is?
To check which partitions are available and their status, you can run the command sinfo. This will display a list of all partitions along with their current status.
What type of network is used in the HPC cluster?
The HPC cluster uses InfiniBand networking, featuring an HPE InfiniBand HDR/Ethernet 200Gb switch.
What are the available GPUs and their specifications?
We have 4 NVIDIA A100 (80GB) installed in the gpucluster partition.
How can I check the installed CUDA version on the GPUs in the HPC cluster?
To check the version of CUDA installed on the GPUs you can run the nvidia-smi command.
For example, you can run the nvidia-smi command on the gpucluster partition:
This will display detailed information about the GPU, including the CUDA version, driver version, and other relevant statistics.
How can I use the nvidia-smi command in a script?
You can use the nvidia-smi command in a script to automate the process of checking GPU status and other relevant information.
Here’s an example of how you can do this in a Bash script:
Troubleshooting and Support
Who do I contact for technical support?
For technical support you can contact the Academic Technology Systems team. You can do so by submitting a request via ServiceNOW or an email to at@sfsu.edu.
What should I do if I encounter an error while submitting a job?
If you encounter an error, check the job's output and error files for details. You can also consult the cluster's documentation or contact technical support for assistance.
How do I check the status of my job?
To check the status of your job you can use the command squeue.
To check the status of all jobs:
To check the status of your job or a job of a specific user:
How do I cancel a job that I’ve already submitted?
To cancel a job you can use the command scancel.
First, find your job using the squeue command:
Then, use scancel to cancel your job:
Comments
0 comments
Article is closed for comments.