亚马逊Kinesis& AWS Lambda重试 [英] Amazon Kinesis & AWS Lambda Retries

查看:133
本文介绍了亚马逊Kinesis& AWS Lambda重试的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Amazon Kinesis 的新手,所以也许这对我来说只是一个问题,但在解决方案

不要想太多,Kinesis只是一个队列.您必须成功使用一条记录(即,从队列中弹出)才能继续下一条记录.就像FIFO堆栈一样.

适当的方法应该是:

  • 从流中获取记录.
  • 在try-catch-finally块中对其进行处理.
  • 如果记录已成功处理,则没有问题. <-尝试
  • 但如果失败,请记下其他地方以调查 失败的原因. <-CATCH
  • 在逻辑块的末尾,请始终保持以下位置: DynamoDB. <-最终
  • 如果系统内部发生故障(内存错误,硬件错误 等),这是另一个故事;因为这可能会影响所有 记录,而不仅仅是一个.

顺便说一句,如果处理记录需要1分钟以上,则很明显您做错了什么.因为Kinesis旨在每秒处理数千条记录,所以您不应该为每个记录处理如此长时间的工作.

您要提出的问题是队列系统的一般问题,有时也称为有毒消息".为了安全起见,您必须在业务逻辑中处理它们.

http://www.cogin.com/articles/Sur​​vivingPoisonMessages.php#PoisonMessages

I'm very new to Amazon Kinesis so maybe this is just a problem in my understanding but in the AWS Lambda FAQ it says:

The Amazon Kinesis and DynamoDB Streams records sent to your AWS Lambda function are strictly serialized, per shard. This means that if you put two records in the same shard, Lambda guarantees that your Lambda function will be successfully invoked with the first record before it is invoked with the second record. If the invocation for one record times out, is throttled, or encounters any other error, Lambda will retry until it succeeds (or the record reaches its 24-hour expiration) before moving on to the next record. The ordering of records across different shards is not guaranteed, and processing of each shard happens in parallel.

My question is, what happens if for some reason some malformed data gets put onto a shard by a producer and when the Lambda function picks it up it errors out and then just keeps retrying constantly? This then means that the processing of that particular shard would be blocked for 24 hours by the error.

Is the best practice to handle application errors like that by wrapping the problem in a custom error and sending this error downstream along with all the successfully processed records and let the consumer handle it? Of course, this still wouldn't help in the case of an unrecoverable error that crashed the program like a null pointer: again we'd be back to the blocking retry loop for the next 24 hours.

解决方案

Don't overthink it, the Kinesis is just a queue. You have to consume a record (ie. pop from the queue) successfully in order to proceed to the next one. Just like a FIFO stack.

The appropriate approach should be:

  • Get a record from stream.
  • Process it in a try-catch-finally block.
  • If the record is processed successfully, no problem. <- TRY
  • But if it fails, note it down to another place to investigate the reason why it failed. <- CATCH
  • And at the end of your logic blocks, always persist the position to DynamoDB. <- FINALLY
  • If an internal occurs in your system (memory error, hardware error etc) that is another story; as it may affect processing all of the records, not just one.

By the way, if processing of a record takes more than 1 minute, it is obvious you are doing something wrong. Because Kinesis is designed to handle thousands of records per second, you should not have the luxury of processing such long jobs for each of them.

The question you are asking is a general problem of queue systems, sometimes called "poisonous message". You have to handle them in your business logic to be safe.

http://www.cogin.com/articles/SurvivingPoisonMessages.php#PoisonMessages

这篇关于亚马逊Kinesis&amp; AWS Lambda重试的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆