在SQS队列中使用许多使用者 [英] Using many consumers in SQS Queue

查看:183
本文介绍了在SQS队列中使用许多使用者的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道可以使用多个线程来使用SQS队列.我想保证每个消息都会被使用一次.我知道可以更改消息的可见性超时,例如,等于我的处理时间.如果我的进程花费的时间超过了可见性超时(例如,连接速度慢),其他线程可能会消耗相同的消息.

I know that it is possible to consume a SQS queue using multiple threads. I would like to guarantee that each message will be consumed once. I know that it is possible to change the visibility timeout of a message, e.g., equal to my processing time. If my process spend more time than the visibility timeout (e.g. a slow connection) other thread can consume the same message.

保证消息将被处理一次的最佳方法是什么?

What is the best approach to guarantee that a message will be processed once?

推荐答案

保证消息将被处理一次的最佳方法是什么?

What is the best approach to guarantee that a message will be processed once?

您要的是保证-您不会获得保证.您可以将多次处理邮件的可能性降低为非常小,但不会得到保证.

You're asking for a guarantee - you won't get one. You can reduce probability of a message being processed more than once to a very small amount, but you won't get a guarantee.

我将解释原因以及减少重复的策略.

I'll explain why, along with strategies for reducing duplication.

  1. 当您在SQS中放置一条消息时,SQS实际上可能会多次收到该消息
    • 例如:在发送邮件时出现轻微的网络故障,会导致短暂错误,该错误会自动重试-从邮件发送者的角度来看,它一次失败,一次成功发送,但是SQS接收到了两条消息.
  1. When you put a message in SQS, SQS might actually receive that message more than once
    • For example: a minor network hiccup while sending the message caused a transient error that was automatically retried - from the message sender's perspective, it failed once, and successfully sent once, but SQS received both messages.
  • 类似于第一个示例-有很多计算机在幕后处理消息,并且SQS需要确保没有丢失任何消息-消息存储在多个服务器上,这可能导致重复.

在大多数情况下,通过利用 SQS消息可见性超时,来自这些来源的重复机会已经很小-占百分之一的百分比很小.

For the most part, by taking advantage of SQS message visibility timeout, the chances of duplication from these sources are already pretty small - like fraction of a percent small.

如果处理重复确实不是那么糟糕(努力让您的邮件使用量成为幂等!),我认为这足够好-进一步减少重复的机会很复杂,而且可能很昂贵...

If processing duplicates really isn't that bad (strive to make your message consumption idempotent!), I'd consider this good enough - reducing chances of duplication further is complicated and potentially expensive...

好吧,在这里,我们深入研究……您将要为消息分配唯一的ID,并在开始处理之前检查正在进行或已完成的ID的原子缓存:

Ok, here we go down the rabbit hole... at a high level, you will want to assign unique ids to your messages, and check against an atomic cache of ids that are in progress or completed before starting processing:

  1. 确保您的消息在插入时提供了唯一的标识符
    • 否则,您将无法区分重复项.
  1. Make sure your messages have unique identifiers provided at insertion time
    • Without this, you'll have no way of telling duplicates apart.
  • 如果您的邮件接收者需要直接发送邮件以进行进一步处理,则它可能是另一个重复来源(出于与上述类似的原因)
    根据处理失败时需要恢复的速度,
  • InProgress条目应具有超时时间.
  • 已完成的条目应根据您希望重复数据删除窗口持续多长时间而设置超时
  • 最简单的方法可能是 Guava缓存,但是仅适用于单个处理应用程序.如果您有大量消息或分散使用,请考虑为此作业使用数据库(具有后台进程以清除过期条目)
  • InProgress entries should have a timeout based on how fast you need to recover in case of processing failure.
  • Completed entries should have a timeout based on how long you want your deduplication window
  • The simplest is probably a Guava cache, but would only be good for a single processing app. If you have a lot of messages or distributed consumption, consider a database for this job (with a background process to sweep for expired entries)
  • 您可能无法承受无限的存储空间.

一些笔记

  • 请记住,没有全部复制的机会就已经很低了.根据对您而言值得花费多少时间和金钱进行重复数据删除,请随时跳过或修改任何步骤
    • 例如,您可以省略"InProgress",但这打开了两个线程同时处理重复消息的可能性很小(第二个线程在第一个线程完成"之前开始)
    • Some notes

      • Keep in mind that chances of duplicate without all of that is already pretty low. Depending on how much time and money deduplication of messages is worth to you, feel free to skip or modify any of the steps
        • For example, you could leave out "InProgress", but that opens up the small chance of two threads working on a duplicated message at the same time (the second one starting before the first has "Completed" it)
          • 您的应用程序在处理完消息之后,但在messageId为已完成"之前可能会崩溃/挂起/执行很长的GC(也许您正在使用数据库进行此存储,并且与它的连接已断开)
          • 在这种情况下,处理"将最终终止,并且另一个线程可以处理此消息(在SQS可见性超时也到期之后,或者因为SQS中有重复项).

          这篇关于在SQS队列中使用许多使用者的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆