Kubernetes 作业中的 Sidecar 容器? [英] Sidecar containers in Kubernetes Jobs?
问题描述
我们在这里使用 Kubernetes Job
s 进行大量的批量计算,我想用监控 sidecar 来检测每个 Job,以根据作业的进度更新集中跟踪系统.>
唯一的问题是,我无法弄清楚作业中多个容器的语义是(或应该是)什么.
无论如何我都试了一下(使用 alpine
sidecar 每 1 秒打印一次hello"),在我的主要任务完成后,Job
被视为 Successful
和 Kubernetes 1.2.0 中的 kubectl get pods
显示:
NAME READY STATUS RESTARTS AGEjob-69541b2b2c0189ba82529830fe6064bd-ddt2b 1/2 已完成 0 4mjob-c53e78aee371403fe5d479ef69485a3d-4qtli 1/2 已完成 0 4m作业-df9a48b2fc89c75d50b298a43ca2c8d3-9r0te 1/2 已完成 0 4m作业-e98fb7df5e78fc3ccd5add85f8825471-eghtw 1/2 已完成 0 4m
如果我描述其中一个豆荚
状态:已终止原因:已完成退出代码:0开始时间:2016 年 3 月 24 日星期四 11:59:19 -0700完成时间:2016 年 3 月 24 日星期四 11:59:21 -0700
然后GET
对作业的yaml显示每个容器的信息:
状态:条件:- lastProbeTime:空lastTransitionTime: 2016-03-24T18:59:29Z消息:'未就绪状态的容器:[pod-template]'原因:ContainersNotReady状态:假"类型:准备容器状态:- 容器ID:docker://333709ca66462b0e41f42f297fa36261aa81fc099741e425b7192fa7ef733937图像:路易吉减少:0.2图像ID:docker://sha256:5a5e15390ef8e89a450dac7f85a9821fb86a33b1b7daeab9f116be252424db70最后状态:{}名称:pod-模板准备好:假重启计数:0状态:终止:容器ID:docker://333709ca66462b0e41f42f297fa36261aa81fc099741e425b7192fa7ef733937退出代码:0完成时间:2016-03-24T18:59:30Z原因:已完成开始于:2016-03-24T18:59:29Z- 容器ID:docker://3d2b51436e435e0b887af92c420d175fafbeb8441753e378eb77d009a38b7e1e图片:高山图像ID:docker://sha256:70c557e50ed630deed07cbb0dc4d28aa0f2a485cf7af124cc48f06bce83f784b最后状态:{}名称:边车准备好:真的重启计数:0状态:跑步:开始于:2016-03-24T18:59:31Z主机IP:10.2.113.74阶段:运行
所以看起来我的 sidecar 需要监视主进程(如何?)并在检测到它单独在 pod 中时优雅地退出?如果这是正确的,那么是否有最佳实践/模式(边车是否应该使用主容器的返回码退出?但它是如何得到的?)?
** 更新 **经过进一步的实验,我还发现了以下内容:如果一个 pod 中有两个容器,那么直到 pod 中的所有容器都返回退出代码为 0 时,才认为它成功.
此外,如果在 pod 规范中设置了 restartPolicy: OnFailure
,那么 pod 中以非零退出代码终止的任何容器都将在同一个 pod 中重新启动(这可能对一个监控边车,用于计算重试次数并在达到一定次数后删除作业(解决目前 Kubernetes 作业中没有最大重试次数的问题)).
您可以使用 向下 api 从 sidecar 中找出您自己的 podname,然后从 apiserver 检索您自己的 pod 以查找存在状态.让我知道这是怎么回事.
We use Kubernetes Job
s for a lot of batch computing here and I'd like to instrument each Job with a monitoring sidecar to update a centralized tracking system with the progress of a job.
The only problem is, I can't figure out what the semantics are (or are supposed to be) of multiple containers in a job.
I gave it a shot anyways (with an alpine
sidecar that printed "hello" every 1 sec) and after my main task completed, the Job
s are considered Successful
and the kubectl get pods
in Kubernetes 1.2.0 shows:
NAME READY STATUS RESTARTS AGE
job-69541b2b2c0189ba82529830fe6064bd-ddt2b 1/2 Completed 0 4m
job-c53e78aee371403fe5d479ef69485a3d-4qtli 1/2 Completed 0 4m
job-df9a48b2fc89c75d50b298a43ca2c8d3-9r0te 1/2 Completed 0 4m
job-e98fb7df5e78fc3ccd5add85f8825471-eghtw 1/2 Completed 0 4m
And if I describe one of those pods
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 24 Mar 2016 11:59:19 -0700
Finished: Thu, 24 Mar 2016 11:59:21 -0700
Then GET
ing the yaml of the job shows information per container:
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2016-03-24T18:59:29Z
message: 'containers with unready status: [pod-template]'
reason: ContainersNotReady
status: "False"
type: Ready
containerStatuses:
- containerID: docker://333709ca66462b0e41f42f297fa36261aa81fc099741e425b7192fa7ef733937
image: luigi-reduce:0.2
imageID: docker://sha256:5a5e15390ef8e89a450dac7f85a9821fb86a33b1b7daeab9f116be252424db70
lastState: {}
name: pod-template
ready: false
restartCount: 0
state:
terminated:
containerID: docker://333709ca66462b0e41f42f297fa36261aa81fc099741e425b7192fa7ef733937
exitCode: 0
finishedAt: 2016-03-24T18:59:30Z
reason: Completed
startedAt: 2016-03-24T18:59:29Z
- containerID: docker://3d2b51436e435e0b887af92c420d175fafbeb8441753e378eb77d009a38b7e1e
image: alpine
imageID: docker://sha256:70c557e50ed630deed07cbb0dc4d28aa0f2a485cf7af124cc48f06bce83f784b
lastState: {}
name: sidecar
ready: true
restartCount: 0
state:
running:
startedAt: 2016-03-24T18:59:31Z
hostIP: 10.2.113.74
phase: Running
So it looks like my sidecar would need to watch the main process (how?) and exit gracefully once it detects it is alone in the pod? If this is correct, then are there best practices/patterns for this (should the sidecar exit with the return code of the main container? but how does it get that?)?
** Update ** After further experimentation, I've also discovered the following: If there are two containers in a pod, then it is not considered successful until all containers in the pod return with exit code 0.
Additionally, if restartPolicy: OnFailure
is set on the pod spec, then any container in the pod that terminates with non-zero exit code will be restarted in the same pod (this could be useful for a monitoring sidecar to count the number of retries and delete the job after a certain number (to workaround no max-retries currently available in Kubernetes jobs)).
You can use the downward api to figure out your own podname from within the sidecar, and then retrieving your own pod from the apiserver to lookup exist status. Let me know how this goes.
这篇关于Kubernetes 作业中的 Sidecar 容器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!