使用Boto3完全扫描dynamoDb [英] Complete scan of dynamoDb with boto3

查看:109
本文介绍了使用Boto3完全扫描dynamoDb的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的桌子大约有220mb,其中有25万条记录。我正在尝试将所有这些数据提取到python中。我意识到这需要一个成批的批处理过程并循环进行,但是我不确定如何设置批处理以从上次中断的地方开始。

My table is around 220mb with 250k records within it. I'm trying to pull all of this data into python. I realize this needs to be a chunked batch process and looped through, but I'm not sure how I can set the batches to start where the previous left off.

是有什么办法可以过滤我的扫描?据我了解,过滤是在加载后发生的,加载在1mb处停止,因此我实际上无法扫描新对象。

Is there some way to filter my scan? From what I read that filtering occurs after loading and the loading stops at 1mb so I wouldn't actually be able to scan in new objects.

任何帮助将不胜感激。

import boto3
dynamodb = boto3.resource('dynamodb',
    aws_session_token = aws_session_token,
    aws_access_key_id = aws_access_key_id,
    aws_secret_access_key = aws_secret_access_key,
    region_name = region
    )

table = dynamodb.Table('widgetsTableName')

data = table.scan()


推荐答案

我认为有关表扫描的Amazon DynamoDB文档回答了您的问题。

I think the Amazon DynamoDB documentation regarding table scanning answers your question.

总之,您'需要在响应中检查 LastEvaluatedKey 。这是使用您的代码的示例:

In short, you'll need to check for LastEvaluatedKey in the response. Here is an example using your code:

import boto3
dynamodb = boto3.resource('dynamodb',
                          aws_session_token=aws_session_token,
                          aws_access_key_id=aws_access_key_id,
                          aws_secret_access_key=aws_secret_access_key,
                          region_name=region
)

table = dynamodb.Table('widgetsTableName')

response = table.scan()
data = response['Items']

while 'LastEvaluatedKey' in response:
    response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
    data.extend(response['Items'])

这篇关于使用Boto3完全扫描dynamoDb的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆