使用 boto3 对 dynamoDb 进行完整扫描 [英] Complete scan of dynamoDb with boto3

查看:34
本文介绍了使用 boto3 对 dynamoDb 进行完整扫描的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的桌子大约有 220mb,里面有 250k 条记录.我正在尝试将所有这些数据提取到 python 中.我意识到这需要一个分块的批处理并循环执行,但我不确定如何将批处理设置为从上一个停止的地方开始.

My table is around 220mb with 250k records within it. I'm trying to pull all of this data into python. I realize this needs to be a chunked batch process and looped through, but I'm not sure how I can set the batches to start where the previous left off.

有什么方法可以过滤我的扫描吗?从我读到的过滤发生在加载后,加载在 1mb 处停止,所以我实际上无法扫描新对象.

Is there some way to filter my scan? From what I read that filtering occurs after loading and the loading stops at 1mb so I wouldn't actually be able to scan in new objects.

如有任何帮助,我们将不胜感激.

Any assistance would be appreciated.

import boto3
dynamodb = boto3.resource('dynamodb',
    aws_session_token = aws_session_token,
    aws_access_key_id = aws_access_key_id,
    aws_secret_access_key = aws_secret_access_key,
    region_name = region
    )

table = dynamodb.Table('widgetsTableName')

data = table.scan()

推荐答案

我认为 Amazon DynamoDB 文档关于表扫描回答了您的问题.

I think the Amazon DynamoDB documentation regarding table scanning answers your question.

简而言之,您需要检查响应中的 LastEvaluatedKey.这是使用您的代码的示例:

In short, you'll need to check for LastEvaluatedKey in the response. Here is an example using your code:

import boto3
dynamodb = boto3.resource('dynamodb',
                          aws_session_token=aws_session_token,
                          aws_access_key_id=aws_access_key_id,
                          aws_secret_access_key=aws_secret_access_key,
                          region_name=region
)

table = dynamodb.Table('widgetsTableName')

response = table.scan()
data = response['Items']

while 'LastEvaluatedKey' in response:
    response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
    data.extend(response['Items'])

这篇关于使用 boto3 对 dynamoDb 进行完整扫描的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆