CGROUPS
- сpu – guarantees the minimum and limits the minimum number of “CPU shares”. In order not to deprive any process.
- cpuact - generates reports on the use of processor resources. Counts the usage of a process.
- cpuset - allows you to assign a process to certain cores. For example, reports that only certain processes have access to a certain kernel.
- memory - monitors and limits the amount of processor memory.
- blkio – sets limits for reading and writing from block devices.
- cgroup v2 is the next version of the Linux cgroup API. cgroup v2 provides a unified control system with enhanced resource management capabilities.
- JAVA 15+ can use cgroup v2. Applications (using JAVA 15+) can be configured to use the container's quotas rather than all the resources available on the Kubernetes node.
- k8s supported.
- Enhanced resource allocation management and isolation across multiple resources
- Unified accounting for different types of memory allocations (network memory, kernel memory, etc).
- The kubelet automatically detects that the OS is running on cgroup v2 and performs accordingly with no additional configuration required.
CAPABILITIES
Permissions for a process to make certain system calls. Only about 20 pieces
- CAP_CHOWN - permission to change the UID and GUID of the file
- CAP_KILL - permission to send signals (sisterm, sigkill, etc.)
- CAP_NET_BIND_SERVICE - permission to use ports with a number less than 1024
- etc etc.
- Requests - a guaranteed amount of resources (If the node does not have enough free resources, then the scheduler does not place the Pod on the node).
- Limits - the maximum amount of the resource. Nothing is guaranteed. Those. the total size of limits can exceed the entire namespace quota. For example, you can set 999 trillion cores. • If you set only Limits, then automatically requests = limits.
- If you specify only requests, then limits will not appear.
#Container resources example
apiVersion: v1
kind: Pod
metadata:
name: frontend
spec:
containers:
- name: app
image: images.my-company.example/app:v4
resources:
requests:
memory: "64Mi"
cpu: "250m"
ephemeral-storage: "2Gi"
limits:
memory: "128Mi"
cpu: "500m"
ephemeral-storage: "4Gi"
How CPU requests work
How CPU Limits Work
- cfs_period_us – time period within which the quota usage is considered. Equals 100000mu (100ms).
- cfs_quota_us – allowed amount of CPU time in us per period.
CPU Management Policy
vim /etc/systemd/system/kubelet.service
--cpu-manager-policy=static \
--kube-reserved=cpu=1,memory=2Gi,ephemeral-storage=1Gi \
--system-reserved=cpu=1,memory=2Gi,ephemeral-storage=1Gi \
- Allows you to assign dedicated cores to containers (cpuset).
- Works if the pod has guaranteed qos.
- Type of requests value by cpu must be an integer.
The role of K8S Scheduler in quotas distribution
- Filtering – the scheduler selects suitable nodes •Scoring – evaluates suitable nodes and selects the most appropriate one. NodeResourcesFit is a scheduler plugin that checks resources on nodes. It checks which nodes have enough Pod resources. You can configure some resources not to be checked.
- Scoring - selects the best node.
- LeastAllocated (default) – bets on the node that is the least utilized.
- MostAllocated
- RequestToCapacityRatio
#Example to use scoringStrategy
apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: KubeSchedulerConfiguration
profiles:
- pluginConfig:
- args:
scoringStrategy:
resources:
- name: cpu
weight: 1
type: MostAllocated
name: NodeResourcesFit
Storage Resource Quota
- Requests.storage - Across all persistent volume claims, the sum of storage requests cannot exceed this value.
- Persistentvolumeclaims - The total number of PersistentVolumeClaims that can exist in the namespace.
- <storage-class-name>.storageclass.storage.k8s.io/requests.storage - Across all persistent volume claims associated with the <storage-class-name>, the sum of storage requests cannot exceed this value.
- <storage-class-name>.storageclass.storage.k8s.io/persistentvolumeclaims - Across all persistent volume claims associated with the storage-class-name, the total number of persistent volume claims that can exist in the namespace.
gold.storageclass.storage.k8s.io/requests.storage: 500Gi
bronze.storageclass.storage.k8s.io/requests.storage: 100Gi
Ephemeral storage
- requests.ephemeralstorage - Across all pods in the namespace, the sum of local ephemeral storage requests cannot exceed this value. The amount of free space that should be on the node at the time the container is launched.
- limits.ephemeral-storage - Across all pods in the namespace, the sum of local ephemeral storage limits cannot exceed this value. The maximum amount of ephemeral storage available to the pod.
- ephemeral-storage - Same as requests.ephemeral-storage. EmptyDir except tmpfs, container logs, rw container layers. If this place runs out on one container, then it will end everywhere.
Quite obscure quotas
- Count/resource – the maximum number of resources of this type in the namespace.
- Count/widget.example.com - example for widgets custom resource from example.com API group
Typical object counts:
- count/persistentvolumeclaims
- count/services
- count/secrets
- count/configmaps
- count/replicationcontrollers
- count/deployments.apps
- count/replicasets.apps
- count/statefulsets.apps
- count/jobs.batch
- count/cronjobs.batch
- Error protection - pod limit per node 110 pieces.
- To prevent bad practices
PID limits
:(){ :|:& };:
Quotas for extended resources
- Extended resources - any resource configured by the cluster operator, which is taken from outside. K8s knows nothing about him and does not work with him in any way.
- Node level - tied to the node, ie. each node has some amount of resource. Often controlled by Device Plugin.
- Cluster level - common for the entire cluster.
#correct:
requests.nvidia.com/gpu: "4"
#not correct:
limits.nvidia.com/gpu: "4"
Network quotas and network Bandwidth
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
template:
metadata:
annotations:
# Ingress bandwidth
kubernetes.io/ingress-bandwidth: 100M
# Egress bandwidth
kubernetes.io/egress-bandwidth: 1G
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: nginx
- Limited through a plug-in to the CNI - this is not a quota in the usual sense of k8s.
- Configurable via pod annotations - works based on Token Bucket Filter.
Some other shared resources
- inode - ephemeral container storage usually has a shared file system.
- Dentry cache - file system cache, stores the relationship between files and directories in which they are located.
Conclusion
- Reduces the influence of the container on each other.
- Provides cluster stability.
- Ensures predictability of container performance.