Compare commits

..

2 commits

Author SHA1 Message Date
ce68817463
chore: add .gitignore 2025-07-22 12:55:23 -06:00
60b4a0cf8b
chore: fixed titles missing from documents 2025-07-22 12:54:37 -06:00
17 changed files with 27 additions and 0 deletions

1
.gitignore vendored Normal file
View file

@ -0,0 +1 @@
.obsidian

View file

@ -1,3 +1,5 @@
# Deployments
OpenCHAMI offers deploying the microservices in several ways. This document covers the supported ways to deploy
- [[Deploying with Podman Quadlets]]

View file

@ -1,3 +1,5 @@
# Getting Started
OpenCHAMI provides a [tutorial](https://github.com/OpenCHAMI/tutorial-2025) to introduce new users to the project. This tutorial demonstrates how to quickly jump start a development environment with the OpenCHAMI services using Podman quadlets and `systemd`. The main part of the tutorial is organized into 2 phases that covers the following topics:
1. Preparing Head Node or Instance

View file

@ -0,0 +1 @@
# Magellan

View file

@ -1,3 +1,5 @@
# Software
The OpenCHAMI project contains a collection of software built to discover, manage, and provision nodes. This sections contains a brief introduction and user guide to quickly get you started with each tool or service.
- **[Magellan](Magellan.md)** - Redfish-based tool for automatic node discovery and firmware management

View file

@ -0,0 +1,2 @@
# State Management Database (SMD)

View file

@ -1,3 +1,5 @@
# Troubleshooting
Sometimes, things don't always work out as we would expect them to when trying to install the services or boot nodes. Whether your issue is related to the services or configuration, this section covers a list of issues you may run into working with OpenCHAMI. Keep in mind that this list is continuously updated as the software is changed.
### Services Not Starting

View file

@ -1,3 +1,4 @@
# AWS Tutorial Environment
For this tutorial, you will be provided with your own EC2 instance and ssh key for access to it. If you would like to replicate it outside the tutorial environment, here are the relevant details.

View file

@ -1,3 +1,4 @@
# Jetstream2 Tutorial Environment
For this tutorial, you will be provided with your own compute instance and ssh key for access to it. If you would like to replicate it outside the tutorial environment, here are the relevant details.

View file

@ -1,3 +1,4 @@
# OpenCHAMI Tutorial
Welcome to the OpenCHAMI hands-on tutorial! This guide walks you through building a complete PXE-boot & cloud-init environment for HPC compute nodes using libvirt/KVM.

View file

@ -1,3 +1,5 @@
# Adding SLURM and MPI to the Compute Node
After getting our nodes to boot using our compute images, let's try running a test MPI job. We need to install and configure both SLURM and MPI to do so. We can do this at least two ways here:
- Create a new `compute-mpi` image similar to the `compute-debug` image using the `compute-base` image as a base. You do not have to rebuild the parent images unless you want to make changes to them, but keep in mind that you will also have to rebuild any derivative images.

View file

@ -1,3 +1,4 @@
# Advanced Use Cases
After going through the [tutorial](https://github.com/OpenCHAMI/tutorial-2025), you should be familiar and comfortable enough with OpenCHAMI to make changes to the deployment process and configuration. We're going to cover some of the more common use-cases that an OpenCHAMI user would want to pursue.
At this point, we can use what we have learned so far in the OpenCHAMI tutorial to customize our nodes in various ways such as changing how we serve images, deriving new images, and updating our cloud-init config. This sections explores some of the use cases that you may want to explore to utilize OpenCHAMI to fit your own needs.

View file

@ -1,3 +1,5 @@
# Discovering Nodes Dynamically with Redfish
In the tutorial, we used static discovery to populate our inventory in SMD instead of dynamically discovering nodes on our network. Static discovery is good when we know beforehand the MAC address, IP address, xname, and/or node ID of our nodes and guarantees deterministic behavior. However, sometimes we might not know these properties or we may want to check the current state of our hardware, say for a failure. In these scenario, we can probe our hardware dynamically using the scanning feature from `magellan` and then update the state of our inventory.
For this demonstration, we have two prerequisites:

View file

@ -1,3 +1,5 @@
# Enable WireGuard Security for the `cloud-init-server`
When nodes boot in OpenCHAMI, they make a request out to the `cloud-init-server` to retrieve a cloud-init config. The request is not encrypted and can be intercepted and modified.
# Using WireGuard with Cloud-Init

View file

@ -1,3 +1,5 @@
# Serving the Root Filesystem with NFS (import-image.sh)
For the [tutorial](https://github.com/OpenCHAMI/tutorial-2025), we served images via HTTP with a local S3 bucket using MinIO and an OCI registry. We could instead serve our images by network mounting the directories that hold our images with NFS. We can spin up a NFS server on the head node by including NFS tools in our base image and configure our nodes to mount the images.
Configure NFS to serve your SquashFS `nfsroot` with as much performance as possible.

View file

@ -1 +1,3 @@
# Using Image Layers to Customize Boot Image with a Common Base
Often, we want to allocate nodes for different purposes using different images. Let's use the base image that we created before and create another Kubernetes layer called `kubernetes-worker` based on the `base` image we created before. We would need to modify the boot script to use this new Kubernetes image and update cloud-init set up the nodes.

View file

@ -0,0 +1 @@
# Using `kexec` to Reboot Nodes For an Upgrade or Specialized Kernel