3 Troubleshooting
David J. Allen edited this page 2025-07-22 12:54:37 -06:00

Troubleshooting

Sometimes, things don't always work out as we would expect them to when trying to install the services or boot nodes. Whether your issue is related to the services or configuration, this section covers a list of issues you may run into working with OpenCHAMI. Keep in mind that this list is continuously updated as the software is changed.

Services Not Starting

Certificate and TLS Errors

Cannot Make Request to Service

Access Token Errors

Errors that deny you access should usually have a clear indication that the appropriate variable is not set or your access token has expired. If you receive these kinds of related errors, there are a few things you may want to check.

  1. If you're making requests using the ochami CLI to services like SMD, make sure that the <name>_ACCESS_TOKEN environment variable is set.
  2. If you're making requests using curl to services like SMD, make sure that you are including the Authorization header in your request.
curl https://demo.openchami.cluster:8443/hsm/v2/Inventory/EthernetInterfaces -H "Authorization: Bearer $ACCESS_TOKEN"

Cannot Discover Nodes

Nodes Are Not Booting

When booting with iPXE, it is critical to make sure that you specified the correct images in the boot script. Make sure there is not a typo and that the image exists.

>>Start PXE over IPv4.
  PXE-E18: Server response timeout.
BdsDxe: failed to load Boot0001 "UEFI PXEv4 (MAC:525400BEEF01)" from PciRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x0)/MAC(525400BEEF01,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0): Not Found

>>Start PXE over IPv6.

Node Cannot Make Request to S3

Sometimes the node may not be able to complete a request to the DHCP to get the iPXE binary.

Images Are Not Pushed to S3 Bucket

Image Are Not Pushed to the OCI Registry

Errors Caused By SELinux

If you see an error like the one below after using s3cmd to create your S3 buckets, try disabling SELinux and try again.

[rocky@devonb-tutorial-practice images]$ s3cmd setacl s3://boot-images --acl-public                                                                                                                   [3/763]
                                                   
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    An unexpected error has occurred.
  Please try reproducing the error using 
  the latest s3cmd code from the git master
  branch found at:                                 
    https://github.com/s3tools/s3cmd
  and have a look at the known issues list:
    https://github.com/s3tools/s3cmd/wiki/Common-known-issues-and-their-solutions-(FAQ)
  If the error persists, please report the
  following lines (removing any private
  info as necessary) to:                           
   s3tools-bugs@lists.sourceforge.net


!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

If you make a curl request and watch the minio-server logs, you should see an error when the request is made.

$ sudo podman exec -it minio-server curl localhost:9000
curl: (56) Recv failure: Connection reset by peer

Then make the request to the minio-server.

$ curl 172.16.0.254:9000
curl: (52) Empty reply from server