initial commit (WIP)

This commit is contained in:
David Allen 2025-07-18 18:26:49 -06:00
commit 6ad4cfb189
Signed by: towk
GPG key ID: 793B2924A49B3A3F
14 changed files with 554 additions and 0 deletions

77
Troubleshooting.md Normal file
View file

@ -0,0 +1,77 @@
Sometimes, things don't always work out as we would expect them to when trying to install the services or boot nodes. Whether your issue is related to the services or configuration, this section covers a list of issues you may run into working with OpenCHAMI. Keep in mind that this list is continuously updated as the software is changed.
### Services Not Starting
### Certificate and TLS Errors
### Cannot Make Request to Service
#### Access Token Errors
When making a request, if you receive errors related to the access, there are a few things you may want to check.
1. If you're making requests using the `ochami` CLI to services like SMD, make sure that the `ACCESS_TOKEN` environment variable is set.
2. If you're
### Cannot Discover Nodes
### Nodes Are Not Booting
When booting with iPXE, it is critical to make sure that you specified the correct images in the boot script. Make sure there is not a typo and that the image exists.
```bash
>>Start PXE over IPv4.
PXE-E18: Server response timeout.
BdsDxe: failed to load Boot0001 "UEFI PXEv4 (MAC:525400BEEF01)" from PciRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x0)/MAC(525400BEEF01,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0): Not Found
>>Start PXE over IPv6.
```
#### Node Cannot Make Request to S3
Sometimes the node may not be able to complete a request to the DHCP to get the iPXE binary.
### Images Are Not Pushed to S3 Bucket
### Image Are Not Pushed to the OCI Registry
### Errors Caused By SELinux
If you see an error like the one below after using `s3cmd` to create your S3 buckets, try disabling SELinux and try again.
```bash
[rocky@devonb-tutorial-practice images]$ s3cmd setacl s3://boot-images --acl-public [3/763]
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
An unexpected error has occurred.
Please try reproducing the error using
the latest s3cmd code from the git master
branch found at:
https://github.com/s3tools/s3cmd
and have a look at the known issues list:
https://github.com/s3tools/s3cmd/wiki/Common-known-issues-and-their-solutions-(FAQ)
If the error persists, please report the
following lines (removing any private
info as necessary) to:
s3tools-bugs@lists.sourceforge.net
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
```
If you make a `curl` request and watch the `minio-server` logs, you should see an error when the request is made.
```bash
$ sudo podman exec -it minio-server curl localhost:9000
curl: (56) Recv failure: Connection reset by peer
```
Then make the request to the `minio-server`.
```bash
$ curl 172.16.0.254:9000
curl: (52) Empty reply from server
```