使用 Google Container Engine (GKE) 和 Stackdriver 监控 Pod 状态并发出警报或重新启动 [英] Monitoring and alerting on pod status or restart with Google Container Engine (GKE) and Stackdriver

查看:22
本文介绍了使用 Google Container Engine (GKE) 和 Stackdriver 监控 Pod 状态并发出警报或重新启动的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法使用 Stackdriver 监控 Pod 状态并重新启动在 GKE 集群中运行的 Pod 计数?

Is there a way to monitor the pod status and restart count of pods running in a GKE cluster with Stackdriver?

虽然我可以在 Stackdriver 中看到所有 Pod 的 CPU、内存和磁盘使用指标,但似乎无法获取有关崩溃的 Pod 或副本集中的 Pod 因崩溃而重新启动的指标.

While I can see CPU, memory and disk usage metrics for all pods in Stackdriver there seems to be no way of getting metrics about crashing pods or pods in a replica set being restarted due to crashes.

我使用 Kubernetes 副本集来管理 Pod,因此它们会在崩溃时重新生成并使用新名称创建.据我所知,Stackdriver 中的指标按 pod-name(在 pod 的生命周期内是唯一的)出现,这听起来并不合理.

I'm using a Kubernetes replica set to manage the pods, hence they are respawned and created with a new name when they crash. As far as I can tell the metrics in Stackdriver appear by pod-name (which is unique for the lifetime of the pod) which doesn't sound really sensible.

在 pod 故障时发出警报听起来很自然,听起来很难相信目前不支持此功能.我从 Stackdriver 获得的用于 Google 容器引擎的监控和警报功能目前似乎毫无用处,因为它们都绑定到生命周期可能非常短的 Pod.

Alerting upon pod failures sounds like such a natural thing that it sounds hard to believe that this is not supported at the moment. The monitoring and alerting capabilities that I get from Stackdriver for Google Container Engine as they stand seem to be rather useless as they are all bound to pods whose lifetime can be very short.

因此,如果这不是开箱即用的,是否有已知的解决方法或最佳实践来监控持续崩溃的 Pod?

So if this doesn't work out of the box are there known workarounds or best practices on how to monitor for continuously crashing pods?

推荐答案

现在有一个内置指标,因此无需设置自定义指标即可轻松实现仪表板和/或警报

There is a built in metric now, so it's easy to dashboard and/or alert on it without setting up custom metrics

Metric: kubernetes.io/container/restart_count
Resource type: k8s_container

这篇关于使用 Google Container Engine (GKE) 和 Stackdriver 监控 Pod 状态并发出警报或重新启动的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆