配置 SQS 死信队列以在收到消息时发出云监视警报 [英] Configure SQS Dead letter Queue to raise a cloud watch alarm on receiving a message

查看:25
本文介绍了配置 SQS 死信队列以在收到消息时发出云监视警报的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Amazon SQS 中使用死信队列.我希望每当队列收到新消息时,它都应该引发 CloudWatch 警报.问题是我在指标上配置了警报:number_of_messages_sent of the queue 但是在 Amazon SQS 死信队列 - Amazon Simple Queue Service 文档.

I was working with Dead letter Queue in Amazon SQS. I want that whenever a new message is received by the queue it should raise a CloudWatch alarm. The problem is I configured an alarm on the metric: number_of_messages_sent of the queue but this metric don't work as expected in case of Dead letter Queues as mentioned in the Amazon SQS Dead-Letter Queues - Amazon Simple Queue Service documentation.

现在关于此的一些建议是使用 number_of_messages_visible 但我不确定如何在警报中配置它.因此,如果我设置此 metric>0 的值,那么这与在队列中获取新消息不同.如果存在旧消息,则度量值将始终为 >0.我可以做某种数学表达式来获得某个定义时间段(比如一分钟)内该指标的增量,但我正在寻找更好的解决方案.

Now some suggestions on this were use number_of_messages_visible but I am not sure how to configure this in an alarm. So if i set that the value of this metric>0 then this is not same as getting a new message in the queue. If an old message is there then the metric value will always be >0. I can do some kind of mathematical expression to get the delta in this metric for some defined period (let's say a minute) but I am looking for some better solution.

推荐答案

我遇到了同样的问题,我的答案是改用 NumberOfMessagesSent.然后我可以为在我配置的时间段内传入的新消息设置我的标准.这是 CloudFormation 对我有用的方法.

I struggled with the same problem and the answer for me was to use NumberOfMessagesSent instead. Then I could set my criteria for new messages that came in during my configured period of time. Here is what worked for me in CloudFormation.

请注意,如果警报因持续故障而保持警报状态,则不会发生个别警报.您可以设置另一个警报来捕捉这些警报.即:1小时内出现100个错误,同方法报警.

Note that individual alarms do not occur if the alarm stays in an alarm state from constant failure. You can setup another alarm to catch those. ie: Alarm when 100 errors occur in 1 hour using the same method.

更新:因为 NumberOfMessagesReceived 和 NumberOfMessagesSent 的指标取决于如何消息排队,我设计了一个新的解决方案,使用指标 ApproximateNumberOfMessagesDelayed 在向 dlq 设置添加延迟后满足我们的需求.如果您手动将消息添加到队列,则 NumberOfMessagesReceived 将起作用.否则在设置延迟后使用 ApproximateNumberOfMessagesDelayed.

Updated: Because the metrics for NumberOfMessagesReceived and NumberOfMessagesSent are dependent on how the message is queued, I have devised a new solutions for our needs using the metric ApproximateNumberOfMessagesDelayed after adding a delay to the dlq settings. If you are adding the messages to the queue manually then NumberOfMessagesReceived will work. Otherwise use ApproximateNumberOfMessagesDelayed after setting up a delay.

MyDeadLetterQueue:
    Type: AWS::SQS::Queue
    Properties:
      MessageRetentionPeriod: 1209600  # 14 days
      DelaySeconds: 60 #for alarms

DLQthresholdAlarm:
 Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmDescription: "Alarm dlq messages when we have 1 or more failed messages in 10 minutes"
      Namespace: "AWS/SQS"
      MetricName: "ApproximateNumberOfMessagesDelayed"
      Dimensions:
        - Name: "QueueName"
          Value:
            Fn::GetAtt:
              - "MyDeadLetterQueue"
              - "QueueName"
      Statistic: "Sum"
      Period: 300  
      DatapointsToAlarm: 1 
      EvaluationPeriods: 2       
      Threshold: 1
      ComparisonOperator: "GreaterThanOrEqualToThreshold"
      AlarmActions:
        - !Ref MyAlarmTopic

这篇关于配置 SQS 死信队列以在收到消息时发出云监视警报的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆