Install From Binary
Important
(All Nodes) means all type nodes should install this component.
(Manager Node) means only the manager node should install this component.
(Login Node) means only the Auth node should install this component.
(Cmp) means only the Compute node should install this component.
Typically, there are three nodes are required to run Slurm.
1
Manage(Manager Node), 1Login Nodeand NCompute(Cmp).but you can choose to install all service in single node. check
Prequisites
- change hostname
(All Nodes)hostnamectl set-hostname (manager|auth|computeXX) - modify
/etc/hosts(All Nodes)echo "192.aa.bb.cc (manager|auth|computeXX)" >> /etc/hosts - disable firewall, selinux, dnsmasq, swap
(All Nodes). more detail here - NFS Server
(Manager Node). NFS is used as the default file system for the Slurm accounting database. - [NFS Client]
(All Nodes). all node should mount the NFS share - Munge
(All Nodes). The auth/munge plugin will be built if the MUNGE authentication development library is installed. MUNGE is used as the default authentication mechanism. - Database
(Manager Node). MySQL support for accounting will be built if the MySQL or MariaDB development library is present. A currently supported version of MySQL or MariaDB should be used.
Install Slurm
- create
slurmuser(All Nodes)groupadd -g 1109 slurm useradd -m -c "slurm manager" -d /var/lib/slurm -u 1109 -g slurm -s /bin/bash slurm
Build RPM package
install depeendencies
(Manager Node)yum -y install gcc gcc-c++ readline-devel perl-ExtUtils-MakeMaker pam-devel rpm-build mysql-devel python3build rpm package
(Manager Node)wget https://download.schedmd.com/slurm/slurm-24.05.2.tar.bz2 -O slurm-24.05.2.tar.bz2 rpmbuild -ta --nodeps slurm-24.05.2.tar.bz2The rpm files will be installed under the
$(HOME)/rpmbuilddirectory of the user building them.send rpm to rest nodes
(Manager Node)ssh root@<$rest_node> "mkdir -p /root/rpmbuild/RPMS/" scp -p $(HOME)/rpmbuild/RPMS/x86_64 root@<$rest_node>:/root/rpmbuild/RPMS/x86_64install rpm
(Manager Node)ssh root@<$rest_node> "yum localinstall /root/rpmbuild/RPMS/x86_64/slurm-*"modify configuration file
(Manager Node)cp /etc/slurm/cgroup.conf.example /etc/slurm/cgroup.conf cp /etc/slurm/slurm.conf.example /etc/slurm/slurm.conf cp /etc/slurm/slurmdbd.conf.example /etc/slurm/slurmdbd.conf chmod 600 /etc/slurm/slurmdbd.conf chown slurm: /etc/slurm/slurmdbd.confcgroup.confdoesnt need to change.edit
/etc/slurm/slurm.conf, you can use this link as a referenceedit
/etc/slurm/slurmdbd.conf, you can use this link as a reference
Install yum repo directly
install slurm
(All Nodes)yum -y slurm-wlm slurmdbdmodify configuration file
(All Nodes)vim /etc/slurm-llnl/slurm.confvim /etc/slurm-llnl/slurmdbd.confcgroup.confdoesnt need to change.edit
/etc/slurm/slurm.conf, you can use this link as a referenceedit
/etc/slurm/slurmdbd.conf, you can use this link as a reference
- send configuration
(Manager Node)scp -r /etc/slurm/*.conf root@<$rest_node>:/etc/slurm/ ssh rootroot@<$rest_node> "mkdir /var/spool/slurmd && chown slurm: /var/spool/slurmd" ssh rootroot@<$rest_node> "mkdir /var/log/slurm && chown slurm: /var/log/slurm" ssh rootroot@<$rest_node> "mkdir /var/spool/slurmctld && chown slurm: /var/spool/slurmctld" - start service
(Manager Node)ssh rootroot@<$rest_node> "systemctl start slurmdbd && systemctl enable slurmdbd" ssh rootroot@<$rest_node> "systemctl start slurmctld && systemctl enable slurmctld" - start service
(All Nodes)ssh rootroot@<$rest_node> "systemctl start slurmd && systemctl enable slurmd"
Test
- show cluster status
scontrol show configsinfo
scontrol show partition
scontrol show node- submit job
srun -N2 hostname
scontrol show jobs- check job status
squeue -a