如何在特定时间暂停普罗米修斯警报 [英] How to snooze prometheus alert for specific time

查看:104
本文介绍了如何在特定时间暂停普罗米修斯警报的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了 Prometheus 内存警报的一些问题.如果我备份 Gitlab,那么内存使用率将高达​​ 95%.我想在特定时间暂停内存警报.

I have faced some issues with Prometheus memory alert. If I take the backup of Gitlab then memory usage going up to 95%. I want to snooze memory alert for a specific time.

例如如果我在凌晨 2 点进行备份,那么我需要暂停 Prometheus 内存警报.可能吗?

e.g. If I am taking a backup at 2 AM then I need to snooze Prometheus memory alert. Is it possible?

推荐答案

正如 Marcelo 所说,没有办法安排静音,但如果定期进行备份(比如每天凌晨 2 点到 3 点),您可以将其包含在警报表达式中.

As Marcelo said, there is no way to schedule a silence but if the backup is made at regular interval (say every night from 2am to 3am), you can include that in the alert expression.

- alert: OutOfMemory
  expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10 AND ON() absent(hour() >= 2 <= 3)

如果您想使许多规则保持沉默(或者如果您想要更复杂的抑制时间表),这会很快变得乏味.在这种情况下,您可以通过以下方式使用警报管理器的禁止规则.

This can rapidly become tedious if you want to silence many rules (or if you want more complex schedules of inhibition). In that case, you can use inhibition rules of alert manager in the following way.

第一步是在 Prometheus 中定义一个警报,在您希望抑制发生时触发:

First step is to define an alert, in Prometheus, fired at the time you want the inhibition to take place:

- alert: BackupHours
  expr: hour() >= 2 <= 3
  for: 1m
  labels:
    notification: none
  annotations:
    description: 'This alert fires during backup hours to inhibit others'

记得在警报管理器中添加路由以避免通知此警报:

Remember to add a route in alert manager to avoid notifying this alert:

routes:
  - match:
      notification: none
    receiver: do_nothing
receivers:
- name: do_nothing

然后在这段时间内使用抑制规则使目标规则静音:

And then use inhibition rules to silence target rules during that time:

inhibit_rules:
- source_match:
    alertname: BackupHours
  target_match:
    # here can be any other selection of alert
    alertname: OutOfMemory

请注意,它仅适用于 UTC 计算的开箱即用.如果您需要 DST,则需要更多样板(通过示例记录规则).

Note that it only works out of the box for UTC computation. If you need DST, it requires more boilerplate (with recording rules by example).

附带说明,如果您正在监控备份过程,您可能已经有一个指标表明备份正在进行中.如果是这样,您可以使用此指标来禁止其他警报,而无需维护时间表.

As a side note, if you are monitoring your backup process, you may already have a metric that indicate the backup is under way. If so, you could use this metrics to inhibit the other alerts and you wouldn't need to maintain a schedule.

这篇关于如何在特定时间暂停普罗米修斯警报的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆