openchami-wiki/Use Cases/Advanced Use Cases.md

9.4 KiB

After going through the tutorial, you should be familiar and comfortable enough with OpenCHAMI to make changes to the deployment process and configuration. We're going to cover some of the more common use-cases that an OpenCHAMI user would want to pursue.

At this point, we can use what we have learned so far in the OpenCHAMI tutorial to customize our nodes in various ways such as changing how we serve images, deriving new images, and updating our cloud-init config. This sections explores some of the use cases that you may want to explore to utilize OpenCHAMI to fit your own needs.

Some of the use cases includes:

  1. Adding SLURM and MPI to the Compute Node
  2. Serving the Root Filesystem with NFS
  3. Enable WireGuard Security for the cloud-init-server
  4. Using Image Layers to Customize Boot Image with a Common Base
  5. Using kexec to Reboot Nodes For an Kernel Upgrade or Specialized Kernel
  6. Discovering Nodes Dynamically with Redfish

Adding SLURM and MPI to the Compute Node

After getting our nodes to boot using our compute images, let's try running a test MPI job. We need to install and configure both SLURM and MPI to do so. We can do this at least two ways here:

  • Create a new compute-mpi image similar to the compute-debug image using the compute-base image as a base. You do not have to rebuild the parent images unless you want to make changes to them, but keep in mind that you will also have to rebuild any derivative images.

Building Into the Image

We can use the image-builder tool to include the SLURM and OpenMPI packages directly in the image. Since we're building a new image for our compute node, we'll base our new image on the compute image definition from the tutorial.

You should already have a directory at /opt/workdir/images. Make sure you already have a base compute image with s3cmd ls.

# TODO: put the output of `s3cmd ls` here with the compute-base image

If you do not have the image, go back to this step in the tutorial, build the image, and push it to S3. Once you have done that, proceed to the next step.

Now, edit a new file at path /opt/workdir/images/compute-slurm-rocky9.yaml and copy the contents below.

options:
  layer_type: 'base'
  name: 'compute-slurm'
  publish_tags:
    - 'rocky9'
  pkg_manager: 'dnf'
  parent: 'demo.openchami.cluster:5000/demo/rocky-base:9'
  registry_opts_pull:
    - '--tls-verify=false'

  # Publish SquashFS image to local S3
  publish_s3: 'http://demo.openchami.cluster:9000'
  s3_prefix: 'compute/base/'
  s3_bucket: 'boot-images'

  # Publish OCI image to container registry
  #
  # This is the only way to be able to re-use this image as
  # a parent for another image layer.
  publish_registry: 'demo.openchami.cluster:5000/demo'
  registry_opts_push:
    - '--tls-verify=false'

repos:
  - alias: 'Epel9'
    url: 'https://dl.fedoraproject.org/pub/epel/9/Everything/x86_64/'
    gpg: 'https://dl.fedoraproject.org/pub/epel/RPM-GPG-KEY-EPEL-9'

packages:
  - slurm
  - openmpi

Notice that the only changes to the new image definition were to the options.name and packages. Since we're basing this image on another image, we only need the packages we want to add to the new image. We can build the image and push it to S3 now.

podman run --rm --device /dev/fuse --network host -e S3_ACCESS=admin -e S3_SECRET=admin123 -v /opt/workdir/images/compute-slurm-rocky9.yaml:/home/builder/config.yaml ghcr.io/openchami/image-build:latest image-build --config config.yaml --log-level DEBUG

Wait until the build finishes and check the S3 bucket to confirm that it is there with s3cmd ls again. Add a new boot script to /opt/workdir/boot/boot-compute-slurm.yaml which we will use to boot our compute nodes.

kernel: 'http://172.16.0.254:9000/boot-images/efi-images/compute/debug/vmlinuz-5.14.0-570.21.1.el9_6.x86_64'
initrd: 'http://172.16.0.254:9000/boot-images/efi-images/compute/debug/initramfs-5.14.0-570.21.1.el9_6.x86_64.img'
params: 'nomodeset ro root=live:http://172.16.0.254:9000/boot-images/compute/debug/rocky9.6-compute-slurm-rocky9 ip=dhcp overlayroot=tmpfs overlayroot_cfgdisk=disabled apparmor=0 selinux=0 console=ttyS0,115200 ip6=off cloud-init=enabled ds=nocloud-net;s=http://172.16.0.254:8081/cloud-init'
macs:
  - 52:54:00:be:ef:01
  - 52:54:00:be:ef:02
  - 52:54:00:be:ef:03
  - 52:54:00:be:ef:04
  - 52:54:00:be:ef:05

Set and confirm that the boot parameters have been set correctly.

ochami bss boot params set -f yaml -d @/opt/workdir/boot/boot-compute-slurm.yaml
ochami bss boot params get -F yaml

Finally, boot the compute node.

sudo virt-install \
  --name compute1 \
  --memory 4096 \
  --vcpus 1 \
  --disk none \
  --pxe \
  --os-variant centos-stream9 \
  --network network=openchami-net,model=virtio,mac=52:54:00:be:ef:01 \
  --graphics none \
  --console pty,target_type=serial \
  --boot network,hd \
  --boot loader=/usr/share/OVMF/OVMF_CODE.secboot.fd,loader.readonly=yes,loader.type=pflash,nvram.template=/usr/share/OVMF/OVMF_VARS.fd,loader_secure=no \
  --virt-type kvm

Your compute node should start up with iPXE output. If your node does not boot, check the troubleshooting sections for common issues.

Installing via Cloud-Init

Alternatively, we can install the necessary SLURM and MPI packages after booting by adding packages to our cloud-init config and use the cmds section for configuration.

Let's start by making changes to the cloud-init config file in /opt/workdir/cloud-init/computes.yaml that we used previously. Note that we are using a pre-built RPMs to install SLURM and OpenMPI from the Rocky 9 repos.

- name: compute
  description: "compute config"
  file:
    encoding: plain
    content: |
      ## template: jinja
      #cloud-config
      merge_how:
      - name: list
        settings: [append]
      - name: dict
        settings: [no_replace, recurse_list]
      users:
        - name: root
          ssh_authorized_keys: {{ ds.meta_data.instance_data.v1.public_keys }}
      disable_root: false
      packages:
	    - slurm
	    - openmpi
      cmds:
	    - systemctl enable slurmctld
	    - systemctl enable slurmdbd

We added the packages section to tell cloud-init to install the slurm and openmpi packages after booting the compute

Prepare SLURM on Head Node

Run a Sample MPI job across two VMs

After we have installed both SLURM and OpenMPI on the compute node, let's try and launch a "hello world" MPI job. To do so, we will need three things:

  1. Source code for MPI program
  2. Compiled MPI executable binary
  3. SLURM job script

We create the MPI program in C. First, create a new directory to store our source code. Then, edit the /opt/workdir/apps/hello.c file.

mkdir -p /opt/workdir/apps/mpi/hello
# edit /opt/workdir/apps/mpi/hello/hello.c

Now copy the contents below into the hello.c file.

/*The Parallel Hello World Program*/
#include <stdio.h>
#include <mpi.h>

main(int argc, char **argv)
{
   int node;
   
   MPI_Init(&argc,&argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &node);
     
   printf("Hello World from Node %d\n",node);
            
   MPI_Finalize();
}

Compile the program.

cd /opt/workdir/apps/mpi/hello
mpicc hello.c -o hello

You should have an hello executable in the /opt/workdir/apps/mpi/hello directory now. We can use this binary executable with SLURM to launch process in parallel.

Let's create a job script to launch the executable we just created. Create a new directory to hold our SLURM job script. Then, edit a new file called launch-hello.sh in the new /opt/workdir/jobscripts directory.

mkdir -p /opt/workdir/jobscripts
cd /opt/workdir/jobscripts
# edit launch.sh

Copy the contents below into the launch-hello.sh job script.

Note

The contents of your job script may vary significantly depending on your cluster. Refer to the documentation for your institution and adjust the script accordingly to your needs.

#!/bin/bash 

#SBATCH --job-name=hello
#SBATCH --account=account_name
#SBATCH --partition=partition_name 
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=4
#SBATCH --time=00:00:30
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK /opt/workdir/apps/mpi/hello/hello

We should now have everything we need to test our MPI job with our compute node(s). Launch the job with the sbatch command.

sbatch /opt/workdir/jobscripts/launch-hello.sh

We can confirm the job is running with the squeue command.

squeue

You should see a list with a job named hello that was given in the launch-hello.sh job script.

# TODO: add output of squeue above

If you saw the output above, you should now be able to inspect the output of the job when it completes.

# TODO: add output of MPI job (should be something like hello.o and/or hello.e)

And that's it! You have successfully launched an MPI job with SLURM from an OpenCHAMI deployed system.