如何在特定时间暂停普罗米修斯警报 [英] How to snooze prometheus alert for specific time
问题描述
我遇到了 Prometheus 内存警报的一些问题.如果我备份 Gitlab,那么内存使用率将高达 95%.我想在特定时间暂停内存警报.
I have faced some issues with Prometheus memory alert. If I take the backup of Gitlab then memory usage going up to 95%. I want to snooze memory alert for a specific time.
例如如果我在凌晨 2 点进行备份,那么我需要暂停 Prometheus 内存警报.可能吗?
e.g. If I am taking a backup at 2 AM then I need to snooze Prometheus memory alert. Is it possible?
推荐答案
正如 Marcelo 所说,没有办法安排静音,但如果定期进行备份(比如每天凌晨 2 点到 3 点),您可以将其包含在警报表达式中.
As Marcelo said, there is no way to schedule a silence but if the backup is made at regular interval (say every night from 2am to 3am), you can include that in the alert expression.
- alert: OutOfMemory
expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10 AND ON() absent(hour() >= 2 <= 3)
如果您想使许多规则保持沉默(或者如果您想要更复杂的抑制时间表),这会很快变得乏味.在这种情况下,您可以通过以下方式使用警报管理器的禁止规则.
This can rapidly become tedious if you want to silence many rules (or if you want more complex schedules of inhibition). In that case, you can use inhibition rules of alert manager in the following way.
第一步是在 Prometheus 中定义一个警报,在您希望抑制发生时触发:
First step is to define an alert, in Prometheus, fired at the time you want the inhibition to take place:
- alert: BackupHours
expr: hour() >= 2 <= 3
for: 1m
labels:
notification: none
annotations:
description: 'This alert fires during backup hours to inhibit others'
记得在警报管理器中添加路由以避免通知此警报:
Remember to add a route in alert manager to avoid notifying this alert:
routes:
- match:
notification: none
receiver: do_nothing
receivers:
- name: do_nothing
然后在这段时间内使用抑制规则使目标规则静音:
And then use inhibition rules to silence target rules during that time:
inhibit_rules:
- source_match:
alertname: BackupHours
target_match:
# here can be any other selection of alert
alertname: OutOfMemory
请注意,它仅适用于 UTC 计算的开箱即用.如果您需要 DST,则需要更多样板(通过示例记录规则).
Note that it only works out of the box for UTC computation. If you need DST, it requires more boilerplate (with recording rules by example).
附带说明,如果您正在监控备份过程,您可能已经有一个指标表明备份正在进行中.如果是这样,您可以使用此指标来禁止其他警报,而无需维护时间表.
As a side note, if you are monitoring your backup process, you may already have a metric that indicate the backup is under way. If so, you could use this metrics to inhibit the other alerts and you wouldn't need to maintain a schedule.
这篇关于如何在特定时间暂停普罗米修斯警报的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!