Install From Binary

(All) means all type nodes should install this component.

(Mgr) means only the manager node should install this component.

(Auth) means only the Auth node should install this component.

(Cmp) means only the Compute node should install this component.

Typically, there are three nodes are required to run Slurm. 1 Manage(Mgr), 1 Auth and N Compute(Cmp). but you can choose to install all service in single node. check

Prequisites

  1. change hostname (All)
    hostnamectl set-hostname (manager|auth|computeXX)
  2. modify /etc/hosts (All)
    echo "192.aa.bb.cc (manager|auth|computeXX)" >> /etc/hosts
  3. disable firewall, selinux, dnsmasq, swap (All). more detail here
  4. NFS Server (Mgr). NFS is used as the default file system for the Slurm accounting database.
  5. [NFS Client] (All). all node should mount the NFS share
    mount <$nfs_server>:/data /data -o proto=tcp -o nolock
  6. Munge (All). The auth/munge plugin will be built if the MUNGE authentication development library is installed. MUNGE is used as the default authentication mechanism.

    All node need to have the munge user and group.

    groupadd -g 1108 munge
    useradd -m -c "Munge Uid 'N' Gid Emporium" -d /var/lib/munge -u 1108 -g munge -s /sbin/nologin munge
    yum install epel-release -y
    yum install munge munge-libs munge-devel -y

    Create global secret key

    /usr/sbin/create-munge-key -r
    dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key

    sync secret to the rest of nodes

    scp -p /etc/munge/munge.key root@<$rest_node>:/etc/munge/
    ssh root@<$rest_node> "chown munge: /etc/munge/munge.key && chmod 400 /etc/munge/munge.key"
    ssh root@<$rest_node> "systemctl start munge && systemctl enable munge"

    test munge if it works

    munge -n | unmunge
  7. Database (Mgr). MySQL support for accounting will be built if the MySQL or MariaDB development library is present. A currently supported version of MySQL or MariaDB should be used.

    install mariadb

    yum -y install mariadb-server
    systemctl start mariadb && systemctl enable mariadb
    ROOT_PASS=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 16) 
    mysql -e "CREATE USER root IDENTIFIED BY '${ROOT_PASS}'"

    login mysql

    mysql -u root -p${ROOT_PASS}
    create database slurm_acct_db;
    create user slurm;
    grant all on slurm_acct_db.* TO 'slurm'@'localhost' identified by '123456' with grant option;
    flush privileges;
    quit

    asdadadasasda

Install Slurm

  1. create slurm user (All)
    groupadd -g 1109 slurm
    useradd -m -c "slurm manager" -d /var/lib/slurm -u 1109 -g slurm -s /bin/bash slurm
Install Slurm from

Build RPM package

  1. install depeendencies (Mgr)

    yum -y install gcc gcc-c++ readline-devel perl-ExtUtils-MakeMaker pam-devel rpm-build mysql-devel python3
  2. build rpm package (Mgr)

    wget https://download.schedmd.com/slurm/slurm-24.05.2.tar.bz2 -O slurm-24.05.2.tar.bz2
    rpmbuild -ta --nodeps slurm-24.05.2.tar.bz2

    The rpm files will be installed under the $(HOME)/rpmbuild directory of the user building them.

  3. send rpm to rest nodes (Mgr)

    ssh root@<$rest_node> "mkdir -p /root/rpmbuild/RPMS/"
    scp -p $(HOME)/rpmbuild/RPMS/x86_64 root@<$rest_node>:/root/rpmbuild/RPMS/x86_64
  4. install rpm (Mgr)

    ssh root@<$rest_node> "yum localinstall /root/rpmbuild/RPMS/x86_64/slurm-*"
  5. modify configuration file (Mgr)

    cp /etc/slurm/cgroup.conf.example /etc/slurm/cgroup.conf
    cp /etc/slurm/slurm.conf.example /etc/slurm/slurm.conf
    cp /etc/slurm/slurmdbd.conf.example /etc/slurm/slurmdbd.conf
    chmod 600 /etc/slurm/slurmdbd.conf
    chown slurm: /etc/slurm/slurmdbd.conf

    cgroup.conf doesnt need to change.

    edit /etc/slurm/slurm.conf, you can use this link as a reference

    edit /etc/slurm/slurmdbd.conf, you can use this link as a reference

Install yum repo directly

  1. install slurm (All)

    yum -y slurm-wlm slurmdbd
  2. modify configuration file (All)

    vim /etc/slurm-llnl/slurm.conf
    vim /etc/slurm-llnl/slurmdbd.conf

    cgroup.conf doesnt need to change.

    edit /etc/slurm/slurm.conf, you can use this link as a reference

    edit /etc/slurm/slurmdbd.conf, you can use this link as a reference

  1. send configuration (Mgr)
     scp -r /etc/slurm/*.conf  root@<$rest_node>:/etc/slurm/
     ssh rootroot@<$rest_node> "mkdir /var/spool/slurmd && chown slurm: /var/spool/slurmd"
     ssh rootroot@<$rest_node> "mkdir /var/log/slurm && chown slurm: /var/log/slurm"
     ssh rootroot@<$rest_node> "mkdir /var/spool/slurmctld && chown slurm: /var/spool/slurmctld"
  2. start service (Mgr)
    ssh rootroot@<$rest_node> "systemctl start slurmdbd && systemctl enable slurmdbd"
    ssh rootroot@<$rest_node> "systemctl start slurmctld && systemctl enable slurmctld"
  3. start service (All)
    ssh rootroot@<$rest_node> "systemctl start slurmd && systemctl enable slurmd"

Test

  1. show cluster status
scontrol show config
sinfo
scontrol show partition
scontrol show node
  1. submit job
srun -N2 hostname
scontrol show jobs
  1. check job status
squeue -a

Reference:

  1. https://slurm.schedmd.com/documentation.html
  2. https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/
  3. https://github.com/Artlands/Install-Slurm