This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | |||
public:ifi-cluster [2021/04/28 21:40] Nicolas Stolz [Storage] |
public:ifi-cluster [2021/05/04 16:50] (current) Simon Haller-Seeber |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | The Department of Computer Science hosts a computing cluster for scientific workloads. Please also consider the [[https://www.uibk.ac.at/zid/systeme/hpc-systeme/#HDR3|HPC Services offered by our ZID]] as they may suit your requirements better. | + | This page moved to the namespace services. Link: [[:services:ifi-cluster]]. |
- | + | ||
- | ====== Hardware ====== | + | |
- | The System is located in the IFI server room (3W06) in the ICT Building. It consists of 17 Nodes with the following specification: | + | |
- | Motherboard: ASRock Fatal1ty X399 Professional Gaming | + | |
- | CPU: AMD Ryzen Threadripper 2950X WOF | + | |
- | Memory: G.Skill D4128GB 2400-15 Flare X K8 | + | |
- | SSD: WD Black SN750 500 GB, NVMe 1.3 , Read: 3470 MB/s, Write: 2600 MB/s | + | |
- | + | ||
- | Nodes gc**1**-gc**8** offer | + | |
- | GPU: 4x ZOTAC GeForce RTX 2070 Blower, 8 GB (GDDR6, 256 Bit) | + | |
- | Nodes gc**9**-gc**16** offer | + | |
- | GPU: 4x ASUS GeForce RTX 2070 SUPER TURBO EVO, 8 GB (GDDR6, 256 Bit) | + | |
- | Node gc**17** offers | + | |
- | GPU: 1x NVIDIA TITAN RTX, 24 GB (GDDR6, 384 Bit) | + | |
- | + | ||
- | The headnode (1x ASUS GeForce RTX 2070 SUPER TURBO EVO, 8 GB ) has 2TB additional storage capacity that is reachable from all nodes. | + | |
- | + | ||
- | \\ | + | |
- | ====== Network ====== | + | |
- | The nodes and storage are interconnected through a 10 Gigabit Ethernet Switch. The upstream connection of the headnode is Gigabit Ethernet. | + | |
- | + | ||
- | \\ | + | |
- | ====== System Configuration ====== | + | |
- | * All nodes run the latest version of Ubuntu Server LTS which is 18.04.2 (Bionic Beaver). | + | |
- | * All nodes have the same home directories mounted. | + | |
- | * SLURM is used as job scheduler. | + | |
- | * [[ifi-auth|IFI Auth]] is used as authentication backend. | + | |
- | + | ||
- | \\ | + | |
- | ====== Authorization & Authentification ====== | + | |
- | To request access to ifi-cluster, users have to subscribe to the [[https://informatik.uibk.ac.at:2080/mailman/listinfo/ifi-cluster|mailing list]]. \\ | + | |
- | After successful registration you can log into the system with your IFI credentials using SSH on **ifi-cluster.uibk.ac.at**. | + | |
- | + | ||
- | \\ | + | |
- | ====== Storage ====== | + | |
- | Your **home directory** /home/name.surname should not exceed 100GB.Permission should be 700 in case you do not share it. | + | |
- | + | ||
- | <code bash> $ chmod 700 /home/name.surname</code> | + | |
- | + | ||
- | + | ||
- | A 18TB HDD space is available to the GPU Cluster as a **scratch space**. It uses the 1GBit network. | + | |
- | + | ||
- | Mount point is : ''/scratch'' on each node | + | |
- | + | ||
- | For better usage, please create a directory for yourself using the same username you have for the cluster and set accessibility only to you. | + | |
- | + | ||
- | (in case you need to share a directory, give 750 or 755 permissions for group/others) | + | |
- | + | ||
- | <code bash> | + | |
- | $ cd /scratch | + | |
- | $ mkdir name.surname | + | |
- | $ chmod -R 700 /scratch/name.surname | + | |
- | </code> | + | |
- | + | ||
- | The **IFI-NAS storage** is mounted at ''/ifi-NAS///[your_group]//''. Information about IFI-NAS is available [[:services:ifi-nas|here]]. | + | |
- | + | ||
- | \\ | + | |
- | **Mounting an external resource:** | + | |
- | + | ||
- | First create your own directory: | + | |
- | + | ||
- | <code bash>mkdir /mnt/users-mount/[your_dir]</code> | + | |
- | + | ||
- | Then mount the external resource: | + | |
- | <code bash>sudo mount server:/resource /mnt/users-mount/[your_dir]</code> | + | |
- | + | ||
- | When finished, please do not forget to unmount it and delete the directory: | + | |
- | + | ||
- | <code bash>sudo umount /mnt/users-mount/[your_dir] | + | |
- | rm -rf /mnt/users-mount/[your_dir]</code> | + | |
- | ===== Installed Software Packages ===== | + | |
- | Ubuntu Distribution Packages: | + | |
- | * Python 2.7 | + | |
- | * Python 3.6 | + | |
- | * gcc 7.4 | + | |
- | * GNU Make 4.1 | + | |
- | Loadable Modules: | + | |
- | * Open MPI 4.0.0 (''/software-shared'') | + | |
- | * mpich 3.3.1 (''/software-shared'') | + | |
- | + | ||
- | ===== Modules ===== | + | |
- | [[https://github.com/cea-hpc/modules|Environment Modules]] can be used to modify a users environment. Use this to dynamically load modules in your ''sbatch'' script. | + | |
- | + | ||
- | Basic commands: | + | |
- | module avail -- list available modules | + | |
- | module show [module name] -- show details about one module | + | |
- | module help [module name] -- show help from one module | + | |
- | + | ||
- | Example usage in ''sbatch'' script: | + | |
- | module load openmpi | + | |
- | module unload openmpi | + | |
- | + | ||
- | \\ | + | |
- | ====== Topology ====== | + | |
- | To be linked here soon... | + | |
- | + | ||
- | + | ||
- | \\ | + | |
- | ====== Usage ====== | + | |
- | + | ||
- | ===== Show running Jobs ===== | + | |
- | squeue | + | |
- | + | ||
- | ===== Submit Job ===== | + | |
- | + | ||
- | ==== Batch Job ==== | + | |
- | + | ||
- | Write a configuration file similar to this basic example: | + | |
- | <file bash cluster-test.sh> | + | |
- | #!/bin/bash -l | + | |
- | #SBATCH --partition=IFIgpu | + | |
- | #SBATCH --job-name=firstDemo | + | |
- | #SBATCH --mail-type=BEGIN,END,FAIL ##optional | + | |
- | #SBATCH --mail-user=max.mustermann@uibk.ac.at ##optional | + | |
- | #SBATCH --account=your_group ##change to your group | + | |
- | #SBATCH --uid=your_username ##change to your username | + | |
- | #SBATCH --nodes=2 | + | |
- | #SBATCH --ntasks-per-node=1 | + | |
- | #SBATCH --mem=4G | + | |
- | #SBATCH --time=0-00:30:00 | + | |
- | #SBATCH --output slurm.%N.%j.out # STDOUT | + | |
- | #SBATCH --error slurm.%N.%j.err # STDERR | + | |
- | srun /bin/hostname | /usr/bin/sort | + | |
- | </file> | + | |
- | + | ||
- | ^Option^Description^ | + | |
- | | %%--partition=%% | specifies on which partition you want to run your job. Available partitions are: ''IFIall'' - for CPU computation, ''IFIgpu'' - for GPU computation , ''IFItitan'' - for large GPU computation on a NVIDIA Titan GPU Card| | + | |
- | | %%--job-name=%% | you can define a name for your job (this field is optional) | | + | |
- | |%%--mail-type=%% |to receive an e-mail if desired. Valid type values are ''NONE'', ''BEGIN'', ''END'', ''FAIL'', ''REQUEUE'', ''ALL'', +others see official documentation (this field is optional) | | + | |
- | |%%--mail-user=%%| specify the users e-mail (it could be any UIBK e-mail). if %%--mail-type%% is specified to other then ''NONE'' and %%--mail-user%% is missing e-mails will be send to local user on the IFI cluster (check it with "mail" command on headnode). For system wide mail forwarding please check [[:knowledgebase:user-environment#email_forwarding_information|here]] | | + | |
- | |%%--account=%%| specify the group you are belonging to| | + | |
- | |%%--uid=%%| specify your_username| | + | |
- | |%%--nodes=%% |define on how many nodes your job should run. The scheduler will allocate the number of nodes to your job. | | + | |
- | |%%--ntasks-per-node=%%| Request that ''ntasks'' be invoked on each node| | + | |
- | |%%--mem=%%| Specify the real memory required per node - default units are megabytes. Different units can be specified using the suffix %%[K|M|G|T]%% | | + | |
- | |%%--time=%%| Set a limit on the total run time of the job allocation. If the requested time limit exceeds the partition's time limit, the job will be left in a PENDING state (possibly indefinitely).| | + | |
- | |%%--output=<filename pattern>%%| The batch script's standard output will be directly to the file name specified in the "filename pattern". By default both standard output and standard error are directed to the same file.| | + | |
- | |%%--error=<filename pattern>%%| The batch script's standard error will be directed to the file name specified in the "filename pattern".| | + | |
- | + | ||
- | The command that you want to run: | + | |
- | <code bash>srun /bin/hostname | /usr/bin/sort </code> | + | |
- | + | ||
- | + | ||
- | Run the file with ''**sbatch %%--test-only <<filename>>%%**'' to check the syntax and with ''**sbatch <<filename>>**'' to submit it. | + | |
- | + | ||
- | + | ||
- | === Other valid and useful options:=== | + | |
- | + | ||
- | ^Option^Description^ | + | |
- | | %%--exclusive[=user]%% | The job allocation can not share nodes with other running jobs (or just other users with the "=user" option). | | + | |
- | | %%--gres=<list>%% | Specifies a comma delimited list of generic consumable resources (GPUs). The format of each entry on the list is "name:type:count" (i.e - -gres=gpu:4 ) | | + | |
- | + | ||
- | + | ||
- | There is also an {{ :public:ccompile.txt |example script}} for C-Compilation. | + | |
- | + | ||
- | Use the official website documentation for details: https://slurm.schedmd.com/sbatch.html | + | |
- | + | ||
- | ==== Interactive Job ==== | + | |
- | + | ||
- | <code bash>srun /bin/hostname</code> | + | |
- | + | ||
- | ==== Interactive shell session ==== | + | |
- | <code bash>srun --nodes=1 --nodelist=gc3 --ntasks-per-node=1 --time=01:00:00 --pty bash -i</code> | + | |
- | will get you a shell on gc3 node for one hour | + | |
- | ===== Cancel Job ===== | + | |
- | scancel <<jobid>> | + | |
- | + | ||
- | The basic commands can be found in the [[https://slurm.schedmd.com/quickstart.html|SLURM User Guide]] | + | |
- | + | ||
- | \\ | + | |
- | ====== Live Monitoring Tool ====== | + | |
- | + | ||
- | The live monitoring tool for the cluster can be found [[https://ifi-smokeping.uibk.ac.at/smokeping/?target=IFIcluster|HERE]] | + | |
- | + | ||
- | \\ | + | |
- | ====== Contact ====== | + | |
- | Contact ifi-sysadmin@informatik.uibk.ac.at if you have problems or further questions. For administration purposes there is also the internal documentation. | + | |
- | + | ||
- | ==== IFI Internal Chat ==== | + | |
- | IFI Internal Chat Room for the GPU Cluster can be accessed at [[https://chat.uibk.ac.at/#/room/#ifi-cluster:uibk.ac.at |Matrix Chat]]. | + |