chore: added tutorial section
This commit is contained in:
parent
b7614eeea9
commit
6908f9bdd1
6 changed files with 2094 additions and 8 deletions
119
Tutorial/OpenCHAMI Tutorial.md
Normal file
119
Tutorial/OpenCHAMI Tutorial.md
Normal file
|
|
@ -0,0 +1,119 @@
|
|||
|
||||
Welcome to the OpenCHAMI hands-on tutorial! This guide walks you through building a complete PXE-boot & cloud-init environment for HPC compute nodes using libvirt/KVM.
|
||||
|
||||
---
|
||||
## 📋 Prerequisites
|
||||
|
||||
The cloud-based instance provided for this class is detailed in [AWS_Environment.md](/AWS_Environment.md). Your instance must meet these requirements before you begin:
|
||||
|
||||
- **OS & Kernel**:
|
||||
- RHEL/CentOS/Rocky 9+ or equivalent
|
||||
- Linux kernel ≥ 5.10 with cgroups v2 enabled
|
||||
- **Packages** (minimum versions):
|
||||
- QEMU 6.x, `virt-install` ≥ 4.x
|
||||
- Podman 4.x
|
||||
- **Networking**:
|
||||
- Bridge device (e.g. `br0`)
|
||||
- **Storage**:
|
||||
- NFS (or equivalent) export for `/var/lib/ochami/images`
|
||||
- MinIO (or S3) with credentials ready
|
||||
- OCI Container registry with credentials ready
|
||||
- **Tools**:
|
||||
- `tcpdump`, `tftp`, `virsh`, `curl`
|
||||
|
||||
---
|
||||
## 🗺️ Conceptual Data Flows
|
||||
|
||||
A quick snapshot of the data flows:
|
||||
|
||||
1. **Discovery**: Head node learns about virtual nodes via `ochami discover`.
|
||||
2. **Image Build**: Containerized image layers → squashfs → organized with registry and served via S3.
|
||||
3. **Provisioning**: PXE boot → TFTP pulls kernel/initrd → installer.
|
||||
4. **Config & Join**: cloud-init applies user-data, finalizes OS.
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Phased Tutorial Outline
|
||||
|
||||
> Each “Phase” is a self-contained lab with a checkpoint exercise.
|
||||
|
||||
### Phase I — Platform Setup
|
||||
|
||||
1. **Instance Preparation**
|
||||
- Host packages, kernel modules, cgroups, bridge setup, nfs setup
|
||||
- Deploy MinIO, nginx, and registry
|
||||
- Checkpoints:
|
||||
- `systemctl status minio`
|
||||
- `systemctl status registry`
|
||||
2. **OpenCHAMI & Core Services**
|
||||
- Install OpenCHAMI RPMs
|
||||
- Deploy internal Certificate Authority and import signing certificate
|
||||
- Checkpoints:
|
||||
- `ochami bss status`
|
||||
- `systemctl list-dependencies openchami.target`
|
||||
|
||||
### Phase II — Boot & Image Infrastructure
|
||||
|
||||
3. **Static Discovery & SMD Population**
|
||||
- Anatomy of `nodes.yaml`, `ochami discover`
|
||||
- Checkpoint: `ochami smd component get | jq '.Components[] | select(.Type == "Node")'`
|
||||
4. **Image Builder**
|
||||
- Define base, compute, debug container layers
|
||||
- Build & push to registry/S3
|
||||
- Checkpoints:
|
||||
- `s3cmd ls -Hr s3://boot-images/`
|
||||
- `regctl tag ls demo.openchami.cluster:5000/demo/rocky-base`
|
||||
5. **PXE Boot Configuration**
|
||||
- `boot.yaml`, BSS parameters, virt-install examples
|
||||
- Verify DHCP options & TFTP with `tcpdump`, `tftp`
|
||||
- Checkpoint: Successful serial console installer
|
||||
6. **Cloud-Init Configuration**
|
||||
- Merging `cloud-init.yaml`, host-group overrides
|
||||
- Customizing users, networking, mounts
|
||||
- Checkpoint: Inspect `/var/log/cloud-init.log` on node
|
||||
|
||||
### Phase III — Post-Boot & Use Cases
|
||||
|
||||
7. **Virtual Compute Nodes & Demo**
|
||||
- `virsh console`, node reboot workflows, cleanup scripts
|
||||
- Scaling to multiple nodes with a looped script
|
||||
- Checkpoint: Run a sample MPI job across two VMs
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Troubleshooting & Tips
|
||||
|
||||
- **PXE ROM silent on serial**
|
||||
- BIOS stage → VGA only; use `--extra-args 'console=ttyS0,115200n8 inst.text'`
|
||||
- **No DHCP OFFER**
|
||||
- Verify via `sudo tcpdump -i br0 port 67 or 68`
|
||||
- **Service fails to start**
|
||||
- Inspect `journalctl -u <service name>`, check port conflicts
|
||||
- **Certficate Issues**
|
||||
- Ensure the system cert contains our root cert `grep CHAMI /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem`
|
||||
- **Token Issues**
|
||||
- Tokens are only valid for an hour. Renew with `export DEMO_ACCESS_TOKEN=$(sudo bash -lc 'gen_access_token')` in each terminal windown
|
||||
|
||||
---
|
||||
|
||||
## 🔐 Security & Best Practices
|
||||
|
||||
- **Insecure default credentials** (MinIO, CoreDHCP admin).
|
||||
- **Use TLS** for API endpoints and registry.
|
||||
- **Isolate VLANs** for provisioning traffic.
|
||||
- **Harden** cloud-init scripts: avoid embedding secrets in plaintext.
|
||||
|
||||
---
|
||||
|
||||
## 📖 Further Reading & Feedback
|
||||
|
||||
- **OpenCHAMI Docs**: https://openchami.org
|
||||
- **cloud-init Reference**: https://cloudinit.readthedocs.io
|
||||
- **PXE/TFTP How-To**: https://wiki.archlinux.org/title/PXE
|
||||
- **Give Feedback**: [Issue Tracker or Feedback Form Link]
|
||||
|
||||
---
|
||||
|
||||
© 2025 OpenCHAMI Project · Licensed under Apache 2.0
|
||||
|
||||
LA-UR-25-25073
|
||||
Loading…
Add table
Add a link
Reference in a new issue