sia.hackernoon.com

Kubernetes can be complex to use, with many potential issues affecting the integrity of your code. Troubleshooting Kubernetes can also be challenging. For example, you might easily identify issues such as an unavailable container cluster or unresponsive pod. However, it might be harder to determine the cause and resolve the issue.

This article outlines some common troubleshooting scenarios in Kubernetes and how you can address them.

Top 5 Kubernetes Coding Errors

The following are the most common coding errors in Kubernetes.

Exit Code 1

This error code indicates that the termination of a container was the result of an invalid reference or application error:

Application errors—these range from simple programming errors in the code the container runs (e.g., “divide by zero”) to advanced runtime environment-related errors (e.g., Python, Java, etc.).
Invalid references—these occur when a file referred to by the image specification is not located in the relevant container image.

Solution:

If you encounter Error Code 1, implement the following steps:

Check if all the files defined in the image specification are present in the container log. If you cannot find one of the files, there is an invalid reference issue. Edit the image specification, so it points to the correct file name and path.
If there is no invalid file reference, look for other issues such as an application error. Check your container logs to identify the library with the error and debug it.

Exit Code 125

This error code indicates that the container failed to run. It occurs when Kubernetes invokes a command in the system shell and fails to execute it properly. For example, you might use the docker run command, but you don’t manage to run it. Common causes for Exit Code 125 include:

The use of an undefined flag in the command—for instance, docker run --abcd.
A user listed in the image specification lacks the necessary permissions.
The container engine is incompatible with the host hardware or operating system.

Solution:

If your container was terminated with Exit Code 125, use the following steps:

Verify the command attempting to run a container has the correct syntax.
Verify that the user attempting to run the container has the appropriate permissions. User permissions must include creating a container on the host in the context of the command’s execution in the image specification.
Implement additional container running options offered by your container engine to find alternative commands. For example, you might use the docker start command instead of docker run in Docker.
Check if you can use the same context or username to run other containers on the host. If the host does not successfully run any container, consider reinstalling the container engine. Alternatively, address underlying compatibility issues between the host setup and container engine.

Exit Code 126

This error code indicates a failure to invoke the command in your container specification. Typical causes for command invoke errors include missing dependencies and flaws in the continuous integration script running the container.

Solution:

If your container terminates with Exit Code 126, implement the following steps:

Search your container logs for the command that the system failed to invoke.
You can verify that the command is the source of the error by performing a trial run of the container specification without it and seeing if it succeeds.
Ensure the command has the proper syntax and can access all the dependencies.
After troubleshooting, you can adjust the container specification and run the container again to verify that you’ve fixed the issue.

Kubernetes PVC Issues

These errors affect Kubernetes PersistentVolumeClaims (PVCs), which are complex mechanisms prone to hard-to-identify errors. A PVC enables a pod to mount a Kubernetes PersistentVolume. PVC errors are often challenging to diagnose and address. They usually fall into one of these categories:

PV creation issues—Kubernetes fails to create a persistent volume or enable access to it, despite the availability of necessary underlying storage resources.
PV provisioning issues—Kubernetes fails to create a persistent volume due to the unavailability of the required storage resources.
Spec changes—Kubernetes cannot connect the persistent volume to a pod due to PVC or PV specifications configuration changes.

Different PVC issues can occur at various stages of the persistent volume lifecycle. Examples of common errors in this category:

FailedAttachVolume—the volume fails to detach from the previous node, preventing it from mounting onto the current node.
FailedMount—the volume fails to mount onto the specified path. This error is often a result of a FailedAttachVolume error, but not always. In some cases, the volume has successfully detached and is available to mount, and another issue prevents it from mounting onto the desired path.
CrashLoopBackOff—the pod repeatedly restarts and crashes in a continuous loop. This error is often due to issues with the PVC (i.e., corruption), although it can also result from other causes.

DaemonSet Issues

Kubernetes DaemonSets manage the life cycle and scheduling of pods to ensure that a single pod runs on each node in each cluster.

DaemonSets are considered unhealthy when they don’t have exactly one pod per node. DaemonSets are often unhealthy due to pending pods or pods stuck in crash loops. Daemon set errors often result from the nodes scheduled to run the pods.

A pod may experience a crash loop for various reasons, such as a lack of resources. Check the specification to identify resources that you can increase—for example, increasing memory or CPU and limiting values may enable pods to run for longer. You can check a pod’s logs to troubleshoot it fully. If there is no apparent issue with resource usage, you should check the pod’s command. If the container terminates before it is supposed to, look for the image used in the specification to verify it is correct.

If one or multiple pods in a DaemonSet are pending, this may indicate that there are insufficient resources for scheduling a pod on every node. You can use the following steps to resolve this issue:

Lower the requested memory and CPU of the DaemonSet.
Take off some of the pods from the affected nodes to free up resources.
Scale up your nodes to make room for the DaemonSet’s pods.

You can prevent DaemonSets from running on specific nodes by modifying the taints of each node or tolerations of a DaemonSet. This approach helps prevent DaemonSets from scheduling pods to specialized nodes that might not have the required resources.

Suppose you don’t require DaemonSet functionality (i.e., one pod per node). In that case, you might use a Deployment instead—this option offers greater flexibility to determine the number of pods on their location.

Conclusion

In this article, I covered the most common Kubernetes coding errors and what you can do about them:

Exit Code 1—application issues and invalid references stemming from an error in the image specification or an issue in an application running in a container.

Exit Code 125—container failed to run, indicating a problem in the image specification or incompatibility between components in the image and the rest of the environment.
Exit Code 126—command invocation error, indicating that a command issued by a container was invalid or used the wrong syntax.
Kubernetes PVC Issues—inability to detach a PersistentVolume from a previous node to attach it to a new node.
DaemonSet Issues—failure to run resources that are needed across a Kubernetes cluster, usually indicating insufficient resources.