设置堆栈驱动程序警报以获取特定的错误消息 [英] Set stackdriver alerts for specific error messages

查看:93
本文介绍了设置堆栈驱动程序警报以获取特定的错误消息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

找不到一种干净的方法来针对云功能中的错误设置Stackdriver警报通知

Cannot find a clean way to set Stackdriver alert notifications on errors in cloud functions

我正在使用云功能将数据处理到云数据存储中.我想提醒两种错误类型:

I am using a cloud function to process data to cloud data store. There are 2 types of errors that I want to be alerted on:

  1. 可能导致功能崩溃的技术异常
  2. 我们正在从云功能记录的自定义错误

我已经完成了以下工作,

I have done the below,

  • 创建了一个用于搜索特定错误的日志指标(尽管此错误不适用于崩溃",因为错误消息每次均会有所不同)
  • 使用以下代码部分中的参数在Stackdriver监控中为此指标创建警报

这是根据问题的答案完成的, 如何在堆栈驱动器中为每个错误创建警报

This is done as per the answer to the question, how to create alert per error in stackdriver

对于这种情况的第一个触发条件,我会收到一封电子邮件.但是,在随后的触发条件下,让我们在第二天说,我不会.事件也处于打开"状态.

For the first trigger of the condition I receive an email. However, on subsequent triggers lets say on the next day, I don't. Also the incident is in 'opened' state.

Resource type: cloud function
Metric:from point 2 above
Aggregation: Aligner: count, Reducer: None, Alignment period: 1m
Configuration: Condition triggers if: Any time series violates, Condition: 
is above, Threshold: 0.001, For: 1 min

所以我有3个问题,

  1. 这是满足我创建警报要求的正确方法吗?

  1. Is this the right way to do to satisfy my requirement of creating alerts?

如何继续接收有关后续错误的警报通知?

How can I still receive alert notifications for subsequent errors?

如何自动/手动将事件设置为已解决"?

How to set the incident to 'resolved' either automatically/ manually?

推荐答案

通常,警报策略一旦停止触发,警报便会自行解决.警报无法解决的问题是,您的指标仅写入非零点-如果没有错误,则不会写入零.这意味着该策略永远不会收到一切都很好的明确信号,因此警报就在那里(它们会在7天后自动关闭,但我想这对您没有多大用处).

Normally, alerts resolve themselves once the alerting policy stops firing. The problem you're having with your alerts not resolving is because your metric only writes non-zero points - if there are no errors, it doesn't write zero. That means that the policy never gets an unambiguous signal that everything is fine, so the alerts just sit there (they'll automatically close after 7 days, but I imagine that's not all that useful for you).

这是一个常见的问题,它是一个棘手的问题.一种可能是将策略写为错误与非零值(例如请求计数)的比率.只要请求计数不为零,如果没有错误,比率将计算为零,因此有关比率的警报将自动解决.不过,您需要对舍入错误有所注意-如果您的请求计数足够高,则可能会丢失一个错误,因为该比率可能会舍入为零.

This is a common problem and it's a tricky one to solve. One possibility is to write your policy as a ratio of errors to something non-zero, like request count. As long as the request count is non-zero, the ratio will compute zero if there are no errors, and so an alert on the ratio will automatically resolve. You need to be a bit careful about rounding errors, though - if your request count is high enough, you might potentially miss a single error because the ratio could round to zero.

Stackdriver工程师Aaron Sher

Aaron Sher, Stackdriver engineer

这篇关于设置堆栈驱动程序警报以获取特定的错误消息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆