Kubernetes 作业中的 Sidecar 容器? [英] Sidecar containers in Kubernetes Jobs?

查看:37
本文介绍了Kubernetes 作业中的 Sidecar 容器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们在这里使用 Kubernetes Jobs 进行大量的批量计算,我想用监控 sidecar 来检测每个 Job,以根据作业的进度更新集中跟踪系统.>

唯一的问题是,我无法弄清楚作业中多个容器的语义是(或应该是)什么.

无论如何我都试了一下(使用 alpine sidecar 每 1 秒打印一次hello"),在我的主要任务完成后,Job 被视为 Successful 和 Kubernetes 1.2.0 中的 kubectl get pods 显示:

NAME READY STATUS RESTARTS AGEjob-69541b2b2c0189ba82529830fe6064bd-ddt2b 1/2 已完成 0 4mjob-c53e78aee371403fe5d479ef69485a3d-4qtli 1/2 已完成 0 4m作业-df9a48b2fc89c75d50b298a43ca2c8d3-9r0te 1/2 已完成 0 4m作业-e98fb7df5e78fc3ccd5add85f8825471-eghtw 1/2 已完成 0 4m

如果我描述其中一个豆荚

状态:已终止原因:已完成退出代码:0开始时间:2016 年 3 月 24 日星期四 11:59:19 -0700完成时间:2016 年 3 月 24 日星期四 11:59:21 -0700

然后GET对作业的yaml显示每个容器的信息:

 状态:条件:- lastProbeTime:空lastTransitionTime: 2016-03-24T18:59:29Z消息:'未就绪状态的容器:[pod-template]'原因:ContainersNotReady状态:假"类型:准备容器状态:- 容器ID:docker://333709ca66462b0e41f42f297fa36261aa81fc099741e425b7192fa7ef733937图像:路易吉减少:0.2图像ID:docker://sha256:5a5e15390ef8e89a450dac7f85a9821fb86a33b1b7daeab9f116be252424db70最后状态:{}名称:pod-模板准备好:假重启计数:0状态:终止:容器ID:docker://333709ca66462b0e41f42f297fa36261aa81fc099741e425b7192fa7ef733937退出代码:0完成时间:2016-03-24T18:59:30Z原因:已完成开始于:2016-03-24T18:59:29Z- 容器ID:docker://3d2b51436e435e0b887af92c420d175fafbeb8441753e378eb77d009a38b7e1e图片:高山图像ID:docker://sha256:70c557e50ed630deed07cbb0dc4d28aa0f2a485cf7af124cc48f06bce83f784b最后状态:{}名称:边车准备好:真的重启计数:0状态:跑步:开始于:2016-03-24T18:59:31Z主机IP:10.2.113.74阶段:运行

所以看起来我的 sidecar 需要监视主进程(如何?)并在检测到它单独在 pod 中时优雅地退出?如果这是正确的,那么是否有最佳实践/模式(边车是否应该使用主容器的返回码退出?但它是如何得到的?)?

** 更新 **经过进一步的实验,我还发现了以下内容:如果一个 pod 中有两个容器,那么直到 pod 中的所有容器都返回退出代码为 0 时,才认为它成功.

此外,如果在 pod 规范中设置了 restartPolicy: OnFailure,那么 pod 中以非零退出代码终止的任何容器都将在同一个 pod 中重新启动(这可能对一个监控边车,用于计算重试次数并在达到一定次数后删除作业(解决目前 Kubernetes 作业中没有最大重试次数的问题)).

解决方案

您可以使用 向下 api 从 sidecar 中找出您自己的 podname,然后从 apiserver 检索您自己的 pod 以查找存在状态.让我知道这是怎么回事.

We use Kubernetes Jobs for a lot of batch computing here and I'd like to instrument each Job with a monitoring sidecar to update a centralized tracking system with the progress of a job.

The only problem is, I can't figure out what the semantics are (or are supposed to be) of multiple containers in a job.

I gave it a shot anyways (with an alpine sidecar that printed "hello" every 1 sec) and after my main task completed, the Jobs are considered Successful and the kubectl get pods in Kubernetes 1.2.0 shows:

NAME                                         READY     STATUS      RESTARTS   AGE
    job-69541b2b2c0189ba82529830fe6064bd-ddt2b   1/2       Completed   0          4m
    job-c53e78aee371403fe5d479ef69485a3d-4qtli   1/2       Completed   0          4m
    job-df9a48b2fc89c75d50b298a43ca2c8d3-9r0te   1/2       Completed   0          4m
    job-e98fb7df5e78fc3ccd5add85f8825471-eghtw   1/2       Completed   0          4m

And if I describe one of those pods

State:              Terminated
  Reason:           Completed
  Exit Code:        0
  Started:          Thu, 24 Mar 2016 11:59:19 -0700
  Finished:         Thu, 24 Mar 2016 11:59:21 -0700

Then GETing the yaml of the job shows information per container:

  status:
    conditions:
    - lastProbeTime: null
      lastTransitionTime: 2016-03-24T18:59:29Z
      message: 'containers with unready status: [pod-template]'
      reason: ContainersNotReady
      status: "False"
      type: Ready
    containerStatuses:
    - containerID: docker://333709ca66462b0e41f42f297fa36261aa81fc099741e425b7192fa7ef733937
      image: luigi-reduce:0.2
      imageID: docker://sha256:5a5e15390ef8e89a450dac7f85a9821fb86a33b1b7daeab9f116be252424db70
      lastState: {}
      name: pod-template
      ready: false
      restartCount: 0
      state:
        terminated:
          containerID: docker://333709ca66462b0e41f42f297fa36261aa81fc099741e425b7192fa7ef733937
          exitCode: 0
          finishedAt: 2016-03-24T18:59:30Z
          reason: Completed
          startedAt: 2016-03-24T18:59:29Z
    - containerID: docker://3d2b51436e435e0b887af92c420d175fafbeb8441753e378eb77d009a38b7e1e
      image: alpine
      imageID: docker://sha256:70c557e50ed630deed07cbb0dc4d28aa0f2a485cf7af124cc48f06bce83f784b
      lastState: {}
      name: sidecar
      ready: true
      restartCount: 0
      state:
        running:
          startedAt: 2016-03-24T18:59:31Z
    hostIP: 10.2.113.74
    phase: Running

So it looks like my sidecar would need to watch the main process (how?) and exit gracefully once it detects it is alone in the pod? If this is correct, then are there best practices/patterns for this (should the sidecar exit with the return code of the main container? but how does it get that?)?

** Update ** After further experimentation, I've also discovered the following: If there are two containers in a pod, then it is not considered successful until all containers in the pod return with exit code 0.

Additionally, if restartPolicy: OnFailure is set on the pod spec, then any container in the pod that terminates with non-zero exit code will be restarted in the same pod (this could be useful for a monitoring sidecar to count the number of retries and delete the job after a certain number (to workaround no max-retries currently available in Kubernetes jobs)).

解决方案

You can use the downward api to figure out your own podname from within the sidecar, and then retrieving your own pod from the apiserver to lookup exist status. Let me know how this goes.

这篇关于Kubernetes 作业中的 Sidecar 容器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆