读取AWS Dynamodb流 [英] Reading AWS Dynamodb Stream

查看:327
本文介绍了读取AWS Dynamodb流的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用DynamoDB流在S3上进行增量DynamoDB备份。我有一个lambda,它读取dynamodb流并将文件写入S3。为了标记已读的碎片,我将ExclusiveStartShardId登录到配置文件中。

I want to do an incremental DynamoDB backup on S3 using DynamoDB Streams. I have a lambda that reads the dynamodb stream and writes files into S3. In order to mark already read shards I have ExclusiveStartShardId logged into configuration file.

我要做的是:


  1. 描述流(使用记录的ExclusiveStartShardId)

  2. 获取流的分片

  3. 对于所有已关闭的分片(具有EndingSequenceNumber)我执行以下操作:


    • 获取特定分片的分片迭代器(shardIteratorType: TRIM_HORIZON)

    • 遍历分片并获取记录,直到NextShardIterator为空

  1. Describe the stream (using the logged ExclusiveStartShardId)
  2. Get stream's shards
  3. For all shards that are CLOSED (has EndingSequenceNumber) I do the following:
    • Get shard iterator for the certain shard (shardIteratorType: 'TRIM_HORIZON')
    • Iterate through shard and fetch records till NextShardIterator becomes null

这里的问题是,我仅读取封闭的分片,并且为了获取新记录,我必须等待(时间不确定)才能关闭它。

The problem here is that I read only closed shards and in order to get new records I must wait (undetermined-amount-of-time) for it to be closed.

似乎最后一个分片通常处于 OPEN 状态(具有 NO EndingSequenceNumber)。如果我从上面的伪代码中删除了对EndingSequenceNumber的检查,则会导致无限循环,因为当我按下最后一个分片时,始终会显示NextShardIterator。我也无法检查获取的项目是否为0,因为分片中可能存在空白。

It seems that the last shard is usually in OPEN state (has NO EndingSequenceNumber). If I remove the check for EndingSequenceNumber from the pseudo code above I end up with infinite loop because when I hit the last shard NextShardIterator is always presented. I cannot also do a check if fetched items are 0 because there could be "gaps" in the shard.

在本教程中,使用了 numChanges 为了停止无限循环 http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.LowLevel.Walkthrough.html#Streams.LowLevel.Walkthrough.Step5

In this tutorial numChanges is used in order to stop the infinite loop http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.LowLevel.Walkthrough.html#Streams.LowLevel.Walkthrough.Step5

在这种情况下最好的方法是什么?

What is the best approach in this situation?

我还发现了一个类似的问题:从dynamodb流中读取数据。不幸的是,我找不到我的问题的答案。

I also found a similar question: Reading data from dynamodb streams. Unfortunately I could not find the answer for my question.

推荐答案

为什么不将DynamoDB流附加为Lambda函数的事件源?然后,Lambda将负责轮询流并在必要时调用您的函数。有关详细信息,请参见

Why not attach the DynamoDB stream as an event source for your Lambda function? Then Lambda will take care of polling the stream and calling your function when necessary. See this for details.

这篇关于读取AWS Dynamodb流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆