Skip to main content

Kubernetes

Lifecycle

This page describes the lifecycle of a NitroEnclaveDeployment. NitroEnclaveDeployment creates a Kubernetes Deployment to bring up pods that are capable of bringing up Nitro Enclaves in the respective nodes they are scheduled to.

A NitroEnclaveDeployment goes to an ERROR or a SCHEDULED status after it is created, depending upon whether the controller was able to successfully schedule a Deployment for it. This information is updated in the STATUS field of the NitroEnclaveDeployment.

The enclave pods take some time to be ready as it has to pull the container images and boot them up. The READY field of the NitroEnclaveDeployment provides this information about its replicas.

Whilst the enclave pods are running, the orchestrator inside the enclaves can restart containers to handle some faults.

Enclave Pod Lifetime

Pods are considered to be ephemeral entities. They are designed to be temporary and can be created and destroyed as needed. This ephemeral nature is particularly important for Nitro Enclaves, which are designed to provide secure and isolated environments for sensitive workloads.

Key Points:

  • Ephemeral Nature: Enclave pods are not meant to be long-lived. They are created when a NitroEnclaveDeployment is created and deleted once the parent NitroEnclaveDeployment is deleted.
  • Rescheduling: Enclave pods, like normal pods, can be rescheduled to different nodes if the original node fails or becomes unavailable. However, these pods have the additional capability of bringing up Nitro Enclaves on their host nodes.
  • Fault Handling: The orchestrator inside the enclaves can restart containers to handle some faults. This ensures that the services running inside the enclave remain available and functional.
  • Lifecycle Management: The lifecycle of the enclave pods is managed by the Kubernetes controller. The controller schedules the pods to nodes that support enclaves and have the AWS Nitro Enclaves device plugin installed.

Nitro Enclave Bootup

When a Nitro Enclave is created, the lifecycle is managed by the enclave pod, which is scheduled by the Kubernetes controller to nodes that support enclaves and have the AWS Nitro Enclaves device plugin installed.

Once the Nitro Enclave is ready with the base services running, the manifest sync process begins. The manifest, which is derived from the NitroEnclaveDeployment spec and includes all the information required for the enclave to spin up the user containers or plugins, is now mounted to the pod. Services inside the pod then sync this manifest to the enclave.

The kubelet running inside the enclave then uses this manifest to spin up all the services as per the specifications. This includes both the base services and the user-defined containers or plugins.

Handling Problems with the NitroEnclaveDeployment

Container failures within Enclaves are managed using a restartPolicy defined for user containers or plugins in the NitroEnclaveDeployment resource. This is done by the orchestrator within the enclave.

If the Nitro Enclave becomes unhealthy, it is taken out of the list of healthy pods, and no traffic is routed to it until it becomes healthy again.

Investigating Nitro Enclave Errors

To investigate the root cause of a Nitro Enclave error, a user can:

  1. Inspect Events: Use kubectl describe ned <name-of-nitro-enclave-deployment> to see events for the NitroEnclaveDeployment, which can provide hints about configuration or runtime issues.
  2. Check Enclave Logs: Check the enclave orchestrator logs on your configured log backend. This is often the most direct way to diagnose the issue causing crashes.
  3. Check Pod Logs: Use kubectl logs <name-of-the-pod> to check logs of the pod managing the enclave. The pod frequently runs health checks on the enclaves and logs any issues it finds.
  4. Check Controller Logs: Use kubectl logs <name-of-the-oblv-controller-pod> to check logs of the controller.
  5. Review Configuration: Ensure that the NitroEnclaveDeployment configuration, including environment variables and mounted volumes, is correct and that all required external resources are available.
  6. Review Permissions: Ensure that necessary permissions are available to the pods that help spin up the enclaves and set up networking to them.