配置 SQS 死信队列以在收到消息时发出云监视警报 [英] Configure SQS Dead letter Queue to raise a cloud watch alarm on receiving a message
问题描述
我在 Amazon SQS 中使用死信队列.我希望每当队列收到新消息时,它都应该引发 CloudWatch 警报.问题是我在指标上配置了警报:number_of_messages_sent
of the queue 但是在 Amazon SQS 死信队列 - Amazon Simple Queue Service 文档.
I was working with Dead letter Queue in Amazon SQS. I want that whenever a new message is received by the queue it should raise a CloudWatch alarm. The problem is I configured an alarm on the metric: number_of_messages_sent
of the queue but this metric don't work as expected in case of Dead letter Queues as mentioned in the Amazon SQS Dead-Letter Queues - Amazon Simple Queue Service documentation.
现在关于此的一些建议是使用 number_of_messages_visible
但我不确定如何在警报中配置它.因此,如果我设置此 metric>0
的值,那么这与在队列中获取新消息不同.如果存在旧消息,则度量值将始终为 >0
.我可以做某种数学表达式来获得某个定义时间段(比如一分钟)内该指标的增量,但我正在寻找更好的解决方案.
Now some suggestions on this were use number_of_messages_visible
but I am not sure how to configure this in an alarm. So if i set that the value of this metric>0
then this is not same as getting a new message in the queue. If an old message is there then the metric value will always be >0
. I can do some kind of mathematical expression to get the delta in this metric for some defined period (let's say a minute) but I am looking for some better solution.
推荐答案
我遇到了同样的问题,我的答案是改用 NumberOfMessagesSent.然后我可以为在我配置的时间段内传入的新消息设置我的标准.这是 CloudFormation 对我有用的方法.
I struggled with the same problem and the answer for me was to use NumberOfMessagesSent instead. Then I could set my criteria for new messages that came in during my configured period of time. Here is what worked for me in CloudFormation.
请注意,如果警报因持续故障而保持警报状态,则不会发生个别警报.您可以设置另一个警报来捕捉这些警报.即:1小时内出现100个错误,同方法报警.
Note that individual alarms do not occur if the alarm stays in an alarm state from constant failure. You can setup another alarm to catch those. ie: Alarm when 100 errors occur in 1 hour using the same method.
更新:因为 NumberOfMessagesReceived 和 NumberOfMessagesSent 的指标取决于如何消息排队,我设计了一个新的解决方案,使用指标 ApproximateNumberOfMessagesDelayed 在向 dlq 设置添加延迟后满足我们的需求.如果您手动将消息添加到队列,则 NumberOfMessagesReceived 将起作用.否则在设置延迟后使用 ApproximateNumberOfMessagesDelayed.
Updated: Because the metrics for NumberOfMessagesReceived and NumberOfMessagesSent are dependent on how the message is queued, I have devised a new solutions for our needs using the metric ApproximateNumberOfMessagesDelayed after adding a delay to the dlq settings. If you are adding the messages to the queue manually then NumberOfMessagesReceived will work. Otherwise use ApproximateNumberOfMessagesDelayed after setting up a delay.
MyDeadLetterQueue:
Type: AWS::SQS::Queue
Properties:
MessageRetentionPeriod: 1209600 # 14 days
DelaySeconds: 60 #for alarms
DLQthresholdAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmDescription: "Alarm dlq messages when we have 1 or more failed messages in 10 minutes"
Namespace: "AWS/SQS"
MetricName: "ApproximateNumberOfMessagesDelayed"
Dimensions:
- Name: "QueueName"
Value:
Fn::GetAtt:
- "MyDeadLetterQueue"
- "QueueName"
Statistic: "Sum"
Period: 300
DatapointsToAlarm: 1
EvaluationPeriods: 2
Threshold: 1
ComparisonOperator: "GreaterThanOrEqualToThreshold"
AlarmActions:
- !Ref MyAlarmTopic
这篇关于配置 SQS 死信队列以在收到消息时发出云监视警报的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!