AWS Kinesis,并发Lambda处理,保证有序 [英] AWS Kinesis, concurrent Lambda processing with a guaranteed ordering

查看:264
本文介绍了AWS Kinesis,并发Lambda处理,保证有序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Lambda,事件源指向Kinesis Stream使用者(具有任意数量的分片)

I have a Lambda with an Event source pointed to a Kinesis Stream Consumer (with an arbitrary number of shards)

我想确保Lambda按顺序而不是同时处理流中具有相同分区键"的项目. (这被用作对象的标识,我不希望多个Lambda在同一对象上同时执行逻辑.)

I would like to ensure that items in the stream with the same 'partition key' are processed by Lambda in sequence and not simultaneously. ( This is being used as the object's identity, and I don't want multiple Lambdas performing logic on the same object simultaneously.)

例如,如果流中的项目具有分区键:

For example, if the items in the stream have partition keys:

1,2,1,3,4,1,2,1

1,2,1,3,4,1,2,1

如果我们采用从左到右的处理顺序,则Lambda将同时使用分区键1,2、3和4来处理一个项目.然后,当完成具有特定分区键的项目时,就可以开始使用该分区键处理另一个项目.

If we take the order of processing to be left to right, Lambda would process an item with each of the partition keys 1,2, 3 and 4 concurrently. Then, when it has finished an item with a specific partition key it can start processing another one with that key.

是否可以通过某种方式实现这一点,而无需使用会无效利用Lambda的分布式锁?

Is this achievable in some way, without the use of a distributed lock that would make inefficient use of Lambda?

谢谢

推荐答案

似乎我以错误的方式解决了问题. Lambda保证在一个分片中,一次一批调用Lambda实例.因此,不需要分布式锁定,因为最坏的情况是在同一批中将有多个记录属于同一实体,并对其进行处理以便可以在Lambda函数本身中在内存中进行管理.

Seems like I was tackling the problem in the wrong way. Lambda guarantees that within a shard, the Lambda instance is invoked on one batch at a time. Therefore, there is no need for a distributed lock as at worst there will be multiple records belonging to the same entity in the same batch and processing them in order can be managed in-memory within the Lambda function itself.

AWS常见问题解答的参考 http://aws.amazon.com/lambda/faqs/

Reference from the AWS FAQs http://aws.amazon.com/lambda/faqs/

问:问:AWS Lambda如何处理来自Amazon Kinesis流和 Amazon DynamoDB流?

Q: How does AWS Lambda process data from Amazon Kinesis streams and Amazon DynamoDB Streams?

发送到您的AWS的Amazon Kinesis和DynamoDB流记录 每个分片严格对Lambda函数进行序列化. 这意味着,如果 您将两个记录放在同一个分片中,Lambda保证您的 Lambda函数将成功与第一个记录一起调用 在第二条记录中调用它之前.如果调用一个 记录超时,被限制或遇到任何其他错误Lambda 将重试,直到成功(或记录达到其24小时) 到期),然后再转到下一条记录.的顺序 不能保证不同分片之间的记录,并且 每个碎片并行发生.

The Amazon Kinesis and DynamoDB Streams records sent to your AWS Lambda function are strictly serialized, per shard. This means that if you put two records in the same shard, Lambda guarantees that your Lambda function will be successfully invoked with the first record before it is invoked with the second record. If the invocation for one record times out, is throttled, or encounters any other error, Lambda will retry until it succeeds (or the record reaches its 24-hour expiration) before moving on to the next record. The ordering of records across different shards is not guaranteed, and processing of each shard happens in parallel.

这篇关于AWS Kinesis,并发Lambda处理,保证有序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆