使用 boto3 对 dynamoDb 进行完整扫描 [英] Complete scan of dynamoDb with boto3
问题描述
我的桌子大约有 220mb,里面有 250k 条记录.我正在尝试将所有这些数据提取到 python 中.我意识到这需要一个分块的批处理并循环执行,但我不确定如何将批处理设置为从上一个停止的地方开始.
My table is around 220mb with 250k records within it. I'm trying to pull all of this data into python. I realize this needs to be a chunked batch process and looped through, but I'm not sure how I can set the batches to start where the previous left off.
有什么方法可以过滤我的扫描吗?从我读到的过滤发生在加载后,加载在 1mb 处停止,所以我实际上无法扫描新对象.
Is there some way to filter my scan? From what I read that filtering occurs after loading and the loading stops at 1mb so I wouldn't actually be able to scan in new objects.
如有任何帮助,我们将不胜感激.
Any assistance would be appreciated.
import boto3
dynamodb = boto3.resource('dynamodb',
aws_session_token = aws_session_token,
aws_access_key_id = aws_access_key_id,
aws_secret_access_key = aws_secret_access_key,
region_name = region
)
table = dynamodb.Table('widgetsTableName')
data = table.scan()
推荐答案
我认为 Amazon DynamoDB 文档关于表扫描回答了您的问题.
I think the Amazon DynamoDB documentation regarding table scanning answers your question.
简而言之,您需要检查响应中的 LastEvaluatedKey
.这是使用您的代码的示例:
In short, you'll need to check for LastEvaluatedKey
in the response. Here is an example using your code:
import boto3
dynamodb = boto3.resource('dynamodb',
aws_session_token=aws_session_token,
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key,
region_name=region
)
table = dynamodb.Table('widgetsTableName')
response = table.scan()
data = response['Items']
while 'LastEvaluatedKey' in response:
response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
data.extend(response['Items'])
这篇关于使用 boto3 对 dynamoDb 进行完整扫描的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!