K8的广告连播优先级& outOfPods [英] K8s pod priority & outOfPods
问题描述
我们遇到的情况是,更新(kubernetes或更具体而言是ICP)后,k8s-cluster的Pod用完了,导致出现"OutOfPods"错误消息.原因是"podsPerCore"设置较低,我们随后对其进行了更正.在此之前,无法提供具有提供的priorityClass(1000000)的吊舱.其他-没有priorityClass(0)-已安排.我采取了不同的行为.我以为K8s调度程序会杀死没有优先级的Pod,以便可以调度具有优先级的Pod.我错了吗?
we had the situation that the k8s-cluster was running out of pods after an update (kubernetes or more specific: ICP) resulting in "OutOfPods" error messages. The reason was a lower "podsPerCore"-setting which we corrected afterwards. Until then there were pods with a provided priorityClass (1000000) which cannot be scheduled. Others - without a priorityClass (0) - were scheduled. I assumed a different behaviour. I thought that the K8s scheduler would kill pods with no priority so that a pod with priority can be scheduled. Was I wrong?
那只是一个理解上的问题,因为我想保证无论什么情况,优先级Pod都在运行.
Thats just a question for understanding because I want to guarantee that the priority pods are running, no matter what.
谢谢
带有Prio的Pod:
Pod with Prio:
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/psp: ibm-anyuid-hostpath-psp
creationTimestamp: "2019-12-16T13:39:21Z"
generateName: dms-config-server-555dfc56-
labels:
app: config-server
pod-template-hash: 555dfc56
release: dms-config-server
name: dms-config-server-555dfc56-2ssxb
namespace: dms
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: dms-config-server-555dfc56
uid: c29c40e1-1da7-11ea-b646-005056a72568
resourceVersion: "65065735"
selfLink: /api/v1/namespaces/dms/pods/dms-config-server-555dfc56-2ssxb
uid: 7758e138-2009-11ea-9ff4-005056a72568
spec:
containers:
- env:
- name: CONFIG_SERVER_GIT_USERNAME
valueFrom:
secretKeyRef:
key: username
name: dms-config-server-git
- name: CONFIG_SERVER_GIT_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: dms-config-server-git
envFrom:
- configMapRef:
name: dms-config-server-app-env
- configMapRef:
name: dms-config-server-git
image: docker.repository..../infra/config-server:2.0.8
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /actuator/health
port: 8080
scheme: HTTP
initialDelaySeconds: 90
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: config-server
ports:
- containerPort: 8080
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /actuator/health
port: 8080
scheme: HTTP
initialDelaySeconds: 20
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: 250m
memory: 600Mi
requests:
cpu: 10m
memory: 300Mi
securityContext:
capabilities:
drop:
- MKNOD
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-v7tpv
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: kub-test-worker-02
priority: 1000000
priorityClassName: infrastructure
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: node.kubernetes.io/memory-pressure
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: default-token-v7tpv
secret:
defaultMode: 420
secretName: default-token-v7tpv
没有Prio的Pod(仅是同一名称空间中的一个示例):
Pod without Prio (just an example within the same namespace):
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/psp: ibm-anyuid-hostpath-psp
creationTimestamp: "2019-09-10T09:09:28Z"
generateName: produkt-service-57d448979d-
labels:
app: produkt-service
pod-template-hash: 57d448979d
release: dms-produkt-service
name: produkt-service-57d448979d-4x5qs
namespace: dms
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: produkt-service-57d448979d
uid: 4096ab97-5cee-11e9-97a2-005056a72568
resourceVersion: "65065755"
selfLink: /api/v1/namespaces/dms/pods/produkt-service-57d448979d-4x5qs
uid: b112c5f7-d3aa-11e9-9b1b-005056a72568
spec:
containers:
- image: docker-snapshot.repository..../dms/produkt- service:0b6e0ecc88a28d2a91ffb1db61f8ca99c09a9d92
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /actuator/health
port: 8080
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
name: produkt-service
ports:
- containerPort: 8080
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /actuator/health
port: 8080
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources: {}
securityContext:
capabilities:
drop:
- MKNOD
procMount: Default
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-v7tpv
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: kub-test-worker-02
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: default-token-v7tpv
secret:
defaultMode: 420
secretName: default-token-v7tpv
推荐答案
在很多情况下都会改变调度程序的工作.有一个有关它的文档: Pod优先级和抢占.
There could be a lot of circumstances that will alter the work of the scheduler. There is a documentation talking about it: Pod priority and preemption.
请注意,此功能在1.14.0版中被认为是稳定的
Be aware of the fact that this features were deemed stable at version 1.14.0
From the IBM perspective please take in mind that the version 1.13.9 will be supported until 19 of February 2020!.
您是正确的,应将优先级较低的窗格替换为优先级较高的窗格.
让我用一个例子来详细说明一下:
Let me elaborate on that with an example:
让我们假设一个具有3个节点(1个主节点和2个节点)的Kubernetes集群:
Let's assume a Kubernetes cluster with 3 nodes (1 master and 2 nodes):
- 默认情况下,您无法在主节点上安排常规Pod
- 唯一可以安排Pod的工作节点具有8GB RAM..
- 第二个工作程序节点有一个污点,无法进行调度.
此示例将基于RAM使用情况,但可以以与CPU时间相同的方式使用.
This example will base on RAM usage but it can be used in the same manner as CPU time.
有2个优先级:
- 零优先级(0)
- 高优先级(1 000 000)
零优先级类别的YAML定义:
YAML definition of zero priority class:
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: zero-priority
value: 0
globalDefault: false
description: "This is priority class for hello pod"
globalDefault: false
用于未分配优先级的对象.默认情况下,它将分配此类.
globalDefault: false
is used for objects that do not have assigned priority class. It will assign this class by default.
高优先级类的YAML定义:
YAML definition of high priority class:
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000
globalDefault: false
description: "This is priority class for goodbye pod"
要应用此优先级类,您将需要调用:
$ kubectl apply -f FILE.yaml
To apply this priority classes you will need to invoke:
$ kubectl apply -f FILE.yaml
使用上述对象,您可以创建部署:
With above objects you can create deployments:
- 您好-优先级较低的部署
- 再见-高优先级部署
HAML部署的YAML定义:
YAML definition of hello deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: hello
spec:
selector:
matchLabels:
app: hello
version: 1.0.0
replicas: 10
template:
metadata:
labels:
app: hello
version: 1.0.0
spec:
containers:
- name: hello
image: "gcr.io/google-samples/hello-app:1.0"
env:
- name: "PORT"
value: "50001"
resources:
requests:
memory: "128Mi"
priorityClassName: zero-priority
请具体看一下该片段:
resources:
requests:
memory: "128Mi"
priorityClassName: zero-priority
由于请求的资源,它将限制Pod的数量,并且将为此部署分配较低的优先级.
It will limit number of pods because of the requested resources as well as it will assign low priority to this deployment.
再见部署的YAML定义:
YAML definition of goodbye deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: goodbye
spec:
selector:
matchLabels:
app: goodbye
version: 2.0.0
replicas: 3
template:
metadata:
labels:
app: goodbye
version: 2.0.0
spec:
containers:
- name: goodbye
image: "gcr.io/google-samples/hello-app:2.0"
env:
- name: "PORT"
value: "50001"
resources:
requests:
memory: "6144Mi"
priorityClassName: high-priority
还请对此片段进行具体查看:
Also please take a specific look on this fragment:
resources:
requests:
memory: "6144Mi"
priorityClassName: high-priority
此Pod对RAM和高优先级的要求会更高.
This pods will have much higher request for RAM and high priority.
没有足够的信息来正确解决此类问题.没有从kubelet
到pods
,nodes
和deployments
本身的许多组件的大量日志.
There is no enough information to properly troubleshoot issues like this. Without extensive logs of many components starting from kubelet
to pods
,nodes
and deployments
itself.
应用hello
部署,看看会发生什么:
$ kubectl apply -f hello.yaml
Apply hello
deployment and see what happens:
$ kubectl apply -f hello.yaml
使用以下命令获取有关部署的基本信息:
Get basic information about the deployment with command:
$ kubectl get deployments hello
一段时间后输出应该看起来像这样:
Output should look like that after a while:
NAME READY UP-TO-DATE AVAILABLE AGE
hello 10/10 10 10 9s
如您所见,所有吊舱均已准备就绪且可用.所请求的资源已分配给他们.
As you can see all of the pods are ready and available. The requested resources were assigned to them.
要获取更多详细信息以进行故障排除,可以调用:
To get more details for troubleshooting purposes you can invoke:
-
$ kubectl describe deployment hello
-
$ kubectl describe node NAME_OF_THE_NODE
$ kubectl describe deployment hello
$ kubectl describe node NAME_OF_THE_NODE
有关上述命令分配的资源的示例信息:
Example information about allocated resources from the above command:
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 250m (12%) 0 (0%)
memory 1280Mi (17%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
应用goodbye
部署,看看会发生什么:
$ kubectl apply -f goodbye.yaml
Apply goodbye
deployment and see what happens:
$ kubectl apply -f goodbye.yaml
通过命令获取有关部署的基本信息:
$ kubectl get deployments
Get basic information about the deployments with command:
$ kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
goodbye 1/3 3 1 25s
hello 9/10 10 9 11m
您可以看到有再见部署,但只有一个吊舱可用.尽管事实上告别具有更高的优先级,但您好豆荚仍在工作.
As you can see there is goodbye deployment but only 1 pod is available. And despite the fact that the goodbye has much higher priority, the hello pods are still working.
为什么会这样?:
$ kubectl describe node NAME_OF_THE_NODE
Non-terminated Pods: (13 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
default goodbye-575968c8d6-bnrjc 0 (0%) 0 (0%) 6Gi (83%) 0 (0%) 15m
default hello-fdfb55c96-6hkwp 0 (0%) 0 (0%) 128Mi (1%) 0 (0%) 27m
default hello-fdfb55c96-djrwf 0 (0%) 0 (0%) 128Mi (1%) 0 (0%) 27m
看看再见吊舱的请求内存.如上6Gi
所述.
Take a look at requested memory for goodbye pod. It is as described above as 6Gi
.
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 250m (12%) 0 (0%)
memory 7296Mi (98%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
Events: <none>
内存使用率接近100%.
The memory usage is near 100%.
获取处于Pending
状态的特定告别吊舱的信息将产生一些更多信息$ kubectl describe pod NAME_OF_THE_POD_IN_PENDING_STATE
:
Getting information about specific goodbye pod that is in Pending
state will yield some more information $ kubectl describe pod NAME_OF_THE_POD_IN_PENDING_STATE
:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 38s (x3 over 53s) default-scheduler 0/3 nodes are available: 1 Insufficient memory, 2 node(s) had taints that the pod didn't tolerate.
未创建再见吊舱,因为没有足够的资源可以满足.但是,您仍然可以使用一些剩余的资源来打个招呼.
Goodbye pod was not created because there were not enough resources that could be satisfied. But there still was some left resources for hello pods.
有一种情况可能会杀死优先级较低的广告连播,并安排优先级较高的广告连播.
将请求的再见吊舱内存更改为2304Mi
.它将允许调度程序分配所有必需的pod(3):
Change the requested memory for goodbye pod to 2304Mi
. It will allow scheduler to assign of all required pods (3):
resources:
requests:
memory: "2304Mi"
您可以删除以前的部署,并在更改了内存参数的情况下应用新的部署.
You can delete the previous deployment and apply new one with memory parameter changed.
调用命令:$ kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
goodbye 3/3 3 3 5m59s
hello 3/10 10 3 48m
如您所见,所有的再见吊舱都可用.
As you can see all of the goodbye pods are available.
Hello Pod减少了,以便为具有较高优先级(再见)的Pod腾出空间.
这篇关于K8的广告连播优先级& outOfPods的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!