如何提醒容器重新启动? [英] How can I alert for container restarted?

查看:389
本文介绍了如何提醒容器重新启动?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我喜欢使用Prometheus和cAdvisor监视容器,以便在容器重新启动时收到警报.我想知道是否有人对此有示例普罗米修斯警报.

I like to monitor the containers using Prometheus and cAdvisor so that when a container restart, I get an alert. I wonder if anyone have sample Prometheus alert for this.

推荐答案

我使用以下Prometheus警报规则来查找一个小时内可以重新启动容器的容器(可以修改为最大时间),这可能对您有所帮助.

I used the following Prometheus alert rule for finding container restarts in an hour(can be modified to max time), It may be helpful for you.

ALERT ContainerRestart/PodRestart
IF rate(kube_pod_container_status_restarts[1h]) * 3600 > 1
FOR 5s
LABELS {action_required = "true", severity="critical/warning/info"}
ANNOTATIONS {DESCRIPTION="Pod {{$labels.namespace}}/{{$labels.pod}} restarting more than once during last one hours.",
SUMMARY="Container {{ $labels.container }} in Pod {{$labels.namespace}}/{{$labels.pod}} restarting more than once times during last one hours."}

rate()

rate(v range-vector)计算范围矢量中时间序列的每秒平均增加速率.单调性中断(例如由于目标重新启动而导致的计数器重置)会自动进行调整.同样,计算会外推到时间范围的末尾,从而允许遗漏刮擦或刮擦周期与该范围的时间段不完全对齐. 以下示例表达式返回范围向量中每个时间序列在过去5分钟内测得的每秒HTTP请求速率:

rate(v range-vector) calculates the per-second average rate of increase of the time series in the range vector. Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for. Also, the calculation extrapolates to the ends of the time range, allowing for missed scrapes or imperfect alignment of scrape cycles with the range's time period. The following example expression returns the per-second rate of HTTP requests as measured over the last 5 minutes, per time series in the range vector:

rate(http_requests_total{job="api-server"}[5m])

rate只能与计数器一起使用.它最适合于警报和慢速计数器的图形显示.

rate should only be used with counters. It is best suited for alerting, and for graphing of slow-moving counters.

请注意,在将rate()与聚合运算符(例如sum())或随时间推移进行聚合的函数(任何以_over_time结尾的函数)结合使用时,请始终先采用rate(),然后再进行聚合.否则,当目标重新启动时,rate()无法检测到计数器重置.

Note that when combining rate() with an aggregation operator (e.g. sum()) or a function aggregating over time (any function ending in _over_time), always take a rate() first, then aggregate. Otherwise rate() cannot detect counter resets when your target restarts.

kube_pod_container_status_restarts_total

指标类型:计数器

标签/标签: container =容器名称,namespace = pod名称空间,pod = pod名称

Labels/Tags: container=container-name, namespace=pod-namespace,pod=pod-name

说明:每个吊舱的容器重新启动次数

Description: The number of container restarts per pod

这篇关于如何提醒容器重新启动?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆