K8的广告连播优先级& outOfPods [英] K8s pod priority & outOfPods

查看:216
本文介绍了K8的广告连播优先级& outOfPods的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们遇到的情况是,更新(kubernetes或更具体而言是ICP)后,k8s-cluster的Pod用完了,导致出现"OutOfPods"错误消息.原因是"podsPerCore"设置较低,我们随后对其进行了更正.在此之前,无法提供具有提供的priorityClass(1000000)的吊舱.其他-没有priorityClass(0)-已安排.我采取了不同的行为.我以为K8s调度程序会杀死没有优先级的Pod,以便可以调度具有优先级的Pod.我错了吗?

we had the situation that the k8s-cluster was running out of pods after an update (kubernetes or more specific: ICP) resulting in "OutOfPods" error messages. The reason was a lower "podsPerCore"-setting which we corrected afterwards. Until then there were pods with a provided priorityClass (1000000) which cannot be scheduled. Others - without a priorityClass (0) - were scheduled. I assumed a different behaviour. I thought that the K8s scheduler would kill pods with no priority so that a pod with priority can be scheduled. Was I wrong?

那只是一个理解上的问题,因为我想保证无论什么情况,优先级Pod都在运行.

Thats just a question for understanding because I want to guarantee that the priority pods are running, no matter what.

谢谢

带有Prio的Pod:

Pod with Prio:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: ibm-anyuid-hostpath-psp
  creationTimestamp: "2019-12-16T13:39:21Z"
  generateName: dms-config-server-555dfc56-
  labels:
    app: config-server
    pod-template-hash: 555dfc56
    release: dms-config-server
  name: dms-config-server-555dfc56-2ssxb
  namespace: dms
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: dms-config-server-555dfc56
    uid: c29c40e1-1da7-11ea-b646-005056a72568
  resourceVersion: "65065735"
  selfLink: /api/v1/namespaces/dms/pods/dms-config-server-555dfc56-2ssxb
  uid: 7758e138-2009-11ea-9ff4-005056a72568
spec:
  containers:
  - env:
    - name: CONFIG_SERVER_GIT_USERNAME
      valueFrom:
        secretKeyRef:
          key: username
          name: dms-config-server-git
    - name: CONFIG_SERVER_GIT_PASSWORD
      valueFrom:
        secretKeyRef:
          key: password
          name: dms-config-server-git
    envFrom:
    - configMapRef:
        name: dms-config-server-app-env
    - configMapRef:
        name: dms-config-server-git
    image: docker.repository..../infra/config-server:2.0.8
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /actuator/health
        port: 8080
        scheme: HTTP
      initialDelaySeconds: 90
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    name: config-server
    ports:
    - containerPort: 8080
      name: http
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /actuator/health
        port: 8080
        scheme: HTTP
      initialDelaySeconds: 20
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    resources:
      limits:
        cpu: 250m
        memory: 600Mi
      requests:
        cpu: 10m
        memory: 300Mi
    securityContext:
      capabilities:
        drop:
        - MKNOD
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-v7tpv
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: kub-test-worker-02
  priority: 1000000
  priorityClassName: infrastructure
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: default-token-v7tpv
    secret:
      defaultMode: 420
      secretName: default-token-v7tpv

没有Prio的Pod(仅是同一名称空间中的一个示例):

Pod without Prio (just an example within the same namespace):

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: ibm-anyuid-hostpath-psp
  creationTimestamp: "2019-09-10T09:09:28Z"
  generateName: produkt-service-57d448979d-
  labels:
    app: produkt-service
    pod-template-hash: 57d448979d
    release: dms-produkt-service
  name: produkt-service-57d448979d-4x5qs
  namespace: dms
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: produkt-service-57d448979d
    uid: 4096ab97-5cee-11e9-97a2-005056a72568
  resourceVersion: "65065755"
  selfLink: /api/v1/namespaces/dms/pods/produkt-service-57d448979d-4x5qs
  uid: b112c5f7-d3aa-11e9-9b1b-005056a72568
spec:
  containers:
  - image: docker-snapshot.repository..../dms/produkt-    service:0b6e0ecc88a28d2a91ffb1db61f8ca99c09a9d92
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /actuator/health
        port: 8080
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    name: produkt-service
    ports:
    - containerPort: 8080
      name: http
      protocol: TCP
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /actuator/health
        port: 8080
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
    resources: {}
    securityContext:
      capabilities:
        drop:
        - MKNOD
      procMount: Default
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-v7tpv
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: kub-test-worker-02
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: default-token-v7tpv
    secret:
      defaultMode: 420
      secretName: default-token-v7tpv

推荐答案

在很多情况下都会改变调度程序的工作.有一个有关它的文档: Pod优先级和抢占.

There could be a lot of circumstances that will alter the work of the scheduler. There is a documentation talking about it: Pod priority and preemption.

请注意,此功能在1.14.0版中被认为是稳定的

Be aware of the fact that this features were deemed stable at version 1.14.0

从IBM的角度来看,请记住将支持版本1.13.9

From the IBM perspective please take in mind that the version 1.13.9 will be supported until 19 of February 2020!.

您是正确的,应将优先级较低的窗格替换为优先级较高的窗格.

让我用一个例子来详细说明一下:

Let me elaborate on that with an example:

让我们假设一个具有3个节点(1个主节点和2个节点)的Kubernetes集群:

Let's assume a Kubernetes cluster with 3 nodes (1 master and 2 nodes):

  • 默认情况下,您无法在主节点上安排常规Pod
  • 唯一可以安排Pod的工作节点具有8GB RAM..
  • 第二个工作程序节点有一个污点,无法进行调度.

此示例将基于RAM使用情况,但可以以与CPU时间相同的方式使用.

This example will base on RAM usage but it can be used in the same manner as CPU time.

有2个优先级:

  • 零优先级(0)
  • 高优先级(1 000 000)

零优先级类别的YAML定义:

YAML definition of zero priority class:

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: zero-priority
value: 0
globalDefault: false
description: "This is priority class for hello pod"

globalDefault: false用于未分配优先级的对象.默认情况下,它将分配此类.

globalDefault: false is used for objects that do not have assigned priority class. It will assign this class by default.

高优先级类的YAML定义:

YAML definition of high priority class:

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
description: "This is priority class for goodbye pod"

要应用此优先级类,您将需要调用: $ kubectl apply -f FILE.yaml

To apply this priority classes you will need to invoke: $ kubectl apply -f FILE.yaml

使用上述对象,您可以创建部署:

With above objects you can create deployments:

  • 您好-优先级较低的部署
  • 再见-高优先级部署

HAML部署的YAML定义:

YAML definition of hello deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello
spec:
  selector:
    matchLabels:
      app: hello
      version: 1.0.0
  replicas: 10
  template:
    metadata:
      labels:
        app: hello
        version: 1.0.0
    spec:
      containers:
      - name: hello
        image: "gcr.io/google-samples/hello-app:1.0"
        env:
        - name: "PORT"
          value: "50001"
        resources:
          requests:
            memory: "128Mi"
      priorityClassName: zero-priority

请具体看一下该片段:

        resources:
          requests:
            memory: "128Mi"
      priorityClassName: zero-priority

由于请求的资源,它将限制Pod的数量,并且将为此部署分配较低的优先级.

It will limit number of pods because of the requested resources as well as it will assign low priority to this deployment.

再见部署的YAML定义:

YAML definition of goodbye deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: goodbye
spec:
  selector:
    matchLabels:
      app: goodbye
      version: 2.0.0
  replicas: 3
  template:
    metadata:
      labels:
        app: goodbye
        version: 2.0.0
    spec:
      containers:
      - name: goodbye
        image: "gcr.io/google-samples/hello-app:2.0"
        env:
        - name: "PORT"
          value: "50001"
        resources:
          requests:
            memory: "6144Mi"
      priorityClassName: high-priority

还请对此片段进行具体查看:

Also please take a specific look on this fragment:

        resources:
          requests:
            memory: "6144Mi"
      priorityClassName: high-priority

此Pod对RAM和高优先级的要求会更高.

This pods will have much higher request for RAM and high priority.

没有足够的信息来正确解决此类问题.没有从kubeletpodsnodesdeployments本身的许多组件的大量日志.

There is no enough information to properly troubleshoot issues like this. Without extensive logs of many components starting from kubelet to pods,nodes and deployments itself.

应用hello部署,看看会发生什么: $ kubectl apply -f hello.yaml

Apply hello deployment and see what happens: $ kubectl apply -f hello.yaml

使用以下命令获取有关部署的基本信息:

Get basic information about the deployment with command:

$ kubectl get deployments hello

一段时间后输出应该看起来像这样:

Output should look like that after a while:

NAME    READY   UP-TO-DATE   AVAILABLE   AGE
hello   10/10   10           10          9s

如您所见,所有吊舱均已准备就绪且可用.所请求的资源已分配给他们.

As you can see all of the pods are ready and available. The requested resources were assigned to them.

要获取更多详细信息以进行故障排除,可以调用:

To get more details for troubleshooting purposes you can invoke:

  • $ kubectl describe deployment hello
  • $ kubectl describe node NAME_OF_THE_NODE
  • $ kubectl describe deployment hello
  • $ kubectl describe node NAME_OF_THE_NODE

有关上述命令分配的资源的示例信息:

Example information about allocated resources from the above command:

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                250m (12%)    0 (0%)
  memory             1280Mi (17%)  0 (0%)
  ephemeral-storage  0 (0%)        0 (0%)

应用goodbye部署,看看会发生什么: $ kubectl apply -f goodbye.yaml

Apply goodbye deployment and see what happens: $ kubectl apply -f goodbye.yaml

通过命令获取有关部署的基本信息: $ kubectl get deployments

Get basic information about the deployments with command: $ kubectl get deployments

NAME      READY   UP-TO-DATE   AVAILABLE   AGE
goodbye   1/3     3            1           25s
hello     9/10    10           9           11m

您可以看到有再见部署,但只有一个吊舱可用.尽管事实上告别具有更高的优先级,但您好豆荚仍在工作.

As you can see there is goodbye deployment but only 1 pod is available. And despite the fact that the goodbye has much higher priority, the hello pods are still working.

为什么会这样?:

$ kubectl describe node NAME_OF_THE_NODE

Non-terminated Pods:          (13 in total)
  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  default                     goodbye-575968c8d6-bnrjc                    0 (0%)        0 (0%)      6Gi (83%)        0 (0%)         15m
  default                     hello-fdfb55c96-6hkwp                       0 (0%)        0 (0%)      128Mi (1%)       0 (0%)         27m
  default                     hello-fdfb55c96-djrwf                       0 (0%)        0 (0%)      128Mi (1%)       0 (0%)         27m

看看再见吊舱的请求内存.如上6Gi所述.

Take a look at requested memory for goodbye pod. It is as described above as 6Gi.

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                250m (12%)    0 (0%)
  memory             7296Mi (98%)  0 (0%)
  ephemeral-storage  0 (0%)        0 (0%)
Events:              <none>

内存使用率接近100%.

The memory usage is near 100%.

获取处于Pending状态的特定告别吊舱的信息将产生一些更多信息$ kubectl describe pod NAME_OF_THE_POD_IN_PENDING_STATE:

Getting information about specific goodbye pod that is in Pending state will yield some more information $ kubectl describe pod NAME_OF_THE_POD_IN_PENDING_STATE:

Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  38s (x3 over 53s)  default-scheduler  0/3 nodes are available: 1 Insufficient memory, 2 node(s) had taints that the pod didn't tolerate.

未创建再见吊舱,因为没有足够的资源可以满足.但是,您仍然可以使用一些剩余的资源来打个招呼.

Goodbye pod was not created because there were not enough resources that could be satisfied. But there still was some left resources for hello pods.

有一种情况可能会杀死优先级较低的广告连播,并安排优先级较高的广告连播.

将请求的再见吊舱内存更改为2304Mi.它将允许调度程序分配所有必需的pod(3):

Change the requested memory for goodbye pod to 2304Mi. It will allow scheduler to assign of all required pods (3):

        resources:
          requests:
            memory: "2304Mi"

您可以删除以前的部署,并在更改了内存参数的情况下应用新的部署.

You can delete the previous deployment and apply new one with memory parameter changed.

调用命令:$ kubectl get deployments

NAME      READY   UP-TO-DATE   AVAILABLE   AGE
goodbye   3/3     3            3           5m59s
hello     3/10    10           3           48m

如您所见,所有的再见吊舱都可用.

As you can see all of the goodbye pods are available.

Hello Pod减少了,以便为具有较高优先级(再见)的Pod腾出空间.

这篇关于K8的广告连播优先级&amp; outOfPods的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆