计算普罗米修斯发出的警报 [英] Count alerts fired by prometheus

查看:159
本文介绍了计算普罗米修斯发出的警报的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我为 Prometheus 定义了一些警报规则,我想统计 Prometheus 发出的警报数量.

I have Prometheus with some alerting rules defined and I want to have statistic regarding the number of alerts fired by Prometheus.

我试图计算用 grafana 发出警报的次数,但它不起作用:

I tried to count how many time an alert is fired with grafana but it doesn't work:

SUM(ALERTS{alertname="XXX", alertstate="firing"})

有没有办法计算一个警报被触发了多少次?

There is a way to count how many times an alert is fired?

推荐答案

您的查询返回的是现在触发了多少警报,而不是每个警报被触发的次数.

Your query returns how many alerts are firing now, not how many times each alert was fired.

我发现此查询(主要)适用于 Prometheus 2.4.0 及更高版本:

I've found this query to (mostly) work with Prometheus 2.4.0 and later:

changes(ALERTS_FOR_STATE[24h])

它将返回过去 24 小时内每个警报从待处理"变为触发"的次数,这意味着它仅适用于首先处于待处理状态的警报(即带有 的警报)对于: 指定).

It will return the number of times each alert went from "pending" to "firing" during the last 24 hours, meaning it will only work for alerts that have a pending state in the first place (i.e. alerts with for: <some_duration> specified).

ALERTS_FOR_STATE 是新添加的 Prometheus 内部指标,用于在 Prometheus 重启后恢复警报.它的文档记录并不完整(实际上根本没有),但它似乎有效.

ALERTS_FOR_STATE is a newly added Prometheus-internal metric that is used for restoring alerts after a Prometheus restart. It's not all that well documented (not at all, actually), but it seems to work.

哦,如果您想要按警报(或环境、工作或其他)分组的结果,您可以按该标签或标签集对结果求和:

Oh, and if you want the results grouped by alert (or environment, or job, or whatever) you can sum the results by that label or set of labels:

sum by(alertname) (changes(ALERTS_FOR_STATE[24h]))

将告诉您每个警报在作业、环境等中触发的次数.

will give you how many times each alert fired across jobs, environments etc.

这篇关于计算普罗米修斯发出的警报的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆