Install From Binary
(All)
means all type nodes should install this component.
(Mgr)
means only the manager
node should install this component.
(Auth)
means only the Auth
node should install this component.
(Cmp)
means only the Compute
node should install this component.
Typically, there are three nodes are required to run Slurm. 1
Manage(Mgr)
, 1Auth
and NCompute(Cmp)
. but you can choose to install all service in single node. check
Prequisites
- change hostname
(All)
hostnamectl set-hostname (manager|auth|computeXX)
- modify
/etc/hosts
(All)
echo "192.aa.bb.cc (manager|auth|computeXX)" >> /etc/hosts
- disable firewall, selinux, dnsmasq, swap
(All)
. more detail here - NFS Server
(Mgr)
. NFS is used as the default file system for the Slurm accounting database. - [NFS Client]
(All)
. all node should mount the NFS share - Munge
(All)
. The auth/munge plugin will be built if the MUNGE authentication development library is installed. MUNGE is used as the default authentication mechanism. - Database
(Mgr)
. MySQL support for accounting will be built if the MySQL or MariaDB development library is present. A currently supported version of MySQL or MariaDB should be used.
Install Slurm
- create
slurm
user(All)
groupadd -g 1109 slurm useradd -m -c "slurm manager" -d /var/lib/slurm -u 1109 -g slurm -s /bin/bash slurm
Build RPM package
install depeendencies
(Mgr)
yum -y install gcc gcc-c++ readline-devel perl-ExtUtils-MakeMaker pam-devel rpm-build mysql-devel python3
build rpm package
(Mgr)
wget https://download.schedmd.com/slurm/slurm-24.05.2.tar.bz2 -O slurm-24.05.2.tar.bz2 rpmbuild -ta --nodeps slurm-24.05.2.tar.bz2
The rpm files will be installed under the
$(HOME)/rpmbuild
directory of the user building them.send rpm to rest nodes
(Mgr)
ssh root@<$rest_node> "mkdir -p /root/rpmbuild/RPMS/" scp -p $(HOME)/rpmbuild/RPMS/x86_64 root@<$rest_node>:/root/rpmbuild/RPMS/x86_64
install rpm
(Mgr)
ssh root@<$rest_node> "yum localinstall /root/rpmbuild/RPMS/x86_64/slurm-*"
modify configuration file
(Mgr)
cp /etc/slurm/cgroup.conf.example /etc/slurm/cgroup.conf cp /etc/slurm/slurm.conf.example /etc/slurm/slurm.conf cp /etc/slurm/slurmdbd.conf.example /etc/slurm/slurmdbd.conf chmod 600 /etc/slurm/slurmdbd.conf chown slurm: /etc/slurm/slurmdbd.conf
cgroup.conf
doesnt need to change.edit
/etc/slurm/slurm.conf
, you can use this link as a referenceedit
/etc/slurm/slurmdbd.conf
, you can use this link as a reference
Install yum repo directly
install slurm
(All)
yum -y slurm-wlm slurmdbd
modify configuration file
(All)
vim /etc/slurm-llnl/slurm.conf
vim /etc/slurm-llnl/slurmdbd.conf
cgroup.conf
doesnt need to change.edit
/etc/slurm/slurm.conf
, you can use this link as a referenceedit
/etc/slurm/slurmdbd.conf
, you can use this link as a reference
- send configuration
(Mgr)
scp -r /etc/slurm/*.conf root@<$rest_node>:/etc/slurm/ ssh rootroot@<$rest_node> "mkdir /var/spool/slurmd && chown slurm: /var/spool/slurmd" ssh rootroot@<$rest_node> "mkdir /var/log/slurm && chown slurm: /var/log/slurm" ssh rootroot@<$rest_node> "mkdir /var/spool/slurmctld && chown slurm: /var/spool/slurmctld"
- start service
(Mgr)
ssh rootroot@<$rest_node> "systemctl start slurmdbd && systemctl enable slurmdbd" ssh rootroot@<$rest_node> "systemctl start slurmctld && systemctl enable slurmctld"
- start service
(All)
ssh rootroot@<$rest_node> "systemctl start slurmd && systemctl enable slurmd"
Test
- show cluster status
scontrol show config
sinfo
scontrol show partition
scontrol show node
- submit job
srun -N2 hostname
scontrol show jobs
- check job status
squeue -a