使用Lambda的DynamoDB流,如何按逻辑组顺序处理记录? [英] DynamoDB Streams with Lambda, how to process the records in order (by logical groups)?

查看:177
本文介绍了使用Lambda的DynamoDB流,如何按逻辑组顺序处理记录?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用DynamoDB流+ AWS Lambda处理聊天消息。有关同一会话 user_idX:user_idY (一个会议室)的消息必须按顺序处理。全局排序并不重要。

I want to use DynamoDB Streams + AWS Lambda to process chat messages. Messages regarding the same conversation user_idX:user_idY (a room) must be processed in order. Global ordering is not important.

假设我以正确的顺序(room:msg1,room:msg2等)提供DynamoDB,如何保证Stream将提供AWS Lambda顺序执行,并保证了在单个流中处理相关消息(房间)的顺序?

Assuming that I feed DynamoDB in the correct order (room:msg1, room:msg2, etc), how to guarantee that the Stream will feed AWS Lambda sequentially, with guaranteed ordering of the processing of related messages (room) across a single stream?

示例,考虑到我有2个分片,如何确保逻辑组进入相同的分片?

Example, considering I have 2 shards, how to make sure the logical group goes to the same shard?

我必须完成此操作:

Shard 1: 12:12:msg3 12:12:msg2 12:12:msg1 ==> consumer
Shard 2: 13:24:msg2 51:91:msg3 13:24:msg1 51:92:msg2 51:92:msg1 ==> consumer

不是这样(消息尊重我在数据库中保存的顺序,但是他们正在放置在不同的分片中,从而错误地并行处理同一房间的不同序列):

And not this (messages are respecting the order that I saved in the database, but they are being placed in different shards, thus incorrectly processing different sequences for the same room in parallel):

Shard 1: 13:24:msg2 51:92:msg2 12:12:msg2 51:92:msg2 12:12:msg1 ==> consumer
Shard 2: 51:91:msg3 12:12:msg3 13:24:msg1 51:92:msg1 ==> consumer

此官方 post 提到了这一点,但我在文档中找不到任何实现方法:

This official post mentions this, but I couldn't find anywhere in the docs how to implement it:


对单个
主键所做的一系列更改的相对顺序将保留在一个分片中。此外,给定的密钥
将出现在一组在同一给定时间点处于活动状态的
的兄弟姐妹碎片中的最多一个。因此,您的代码只需
即可处理分片中的流记录,以准确跟踪对某项的
更改。

The relative ordering of a sequence of changes made to a single primary key will be preserved within a shard. Further, a given key will be present in at most one of a set of sibling shards that are active at a given point in time. As a result, your code can simply process the stream records within a shard in order to accurately track changes to an item.



问题



1)如何在DynamoDB流中设置分区键?

Questions

1) How to set a partition key in DynamoDB Streams?

2 )如何创建保证分区键一致交付的Stream分片?

2) How to create Stream shards that guarantee partition key consistent delivery?

3)毕竟这真的可能吗?由于官方文章提到:给定密钥将出现在一组在给定时间点处于活动状态的同级分片中最多一个,因此看来msg1可能会转到分片1,然后像上面的例子一样,是将msg2分片2?

3) Is this really possible after all? Since the official article mentions: a given key will be present in at most one of a set of sibling shards that are active at a given point in time so it seems that msg1 may go to shard 1 and then msg2 to shard 2, as my example above?

已编辑:这个问题,我发现了这个问题:

EDITED: In this question, I found this:


流中分片的数量取决于表中
分区的数量。因此,如果您的DDB表具有4个
分区,那么您的流将具有4个分片。每个分片
对应一个特定的分区,因此,假定具有
相同分区键的所有项目都应存在于同一分区中,这也意味着
意味着这些项目将存在于同一分区中碎片。

The amount of shards that your stream has, is based on the amount of partitions the table has. So if you have a DDB table with 4 partitions, then your stream will have 4 shards. Each shard corresponds to a specific partition, so given that all items with the same partition key should be present in the same partition, it also means that those items will be present in the same shard.

这是否意味着我可以自动实现我所需要的? 具有相同分区的所有项目都将出现在相同的分片中 。 Lambda会尊重吗?

Does this mean that I can achieve what I need automatically? "All items with the same partition will be present in the same shard". Does Lambda respect this?

编辑2:来自常见问题解答


不同分片上记录的顺序不是保证,并且每个分片的
处理是并行进行的。

The ordering of records across different shards is not guaranteed, and processing of each shard happens in parallel.

我不在乎全局排序,只是合乎逻辑的根据示例。仍然不清楚分片是否在逻辑上与FAQ给出的答案一致。

I don't care about global ordering, just logical one as per example. Still, not clear if the shards group logically with this answer from the FAQ.

推荐答案

按顺序处理同一更新密钥将自动发生。如此演示文稿,每个活动分片运行一个Lambda函数。由于特定分区/排序键的所有更新都恰好显示在一个分片谱系中,因此将按顺序进行处理。

In-order processing for updates on the same key will happen automatically. As described in this presentation, one Lambda function per active shard is run. Because all the updates for a particular partition/sort key appear in exactly one shard lineage, they are processed in order.

这篇关于使用Lambda的DynamoDB流,如何按逻辑组顺序处理记录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆