使用Boto3完全扫描dynamoDb [英] Complete scan of dynamoDb with boto3
问题描述
我的桌子大约有220mb,其中有25万条记录。我正在尝试将所有这些数据提取到python中。我意识到这需要一个成批的批处理过程并循环进行,但是我不确定如何设置批处理以从上次中断的地方开始。
My table is around 220mb with 250k records within it. I'm trying to pull all of this data into python. I realize this needs to be a chunked batch process and looped through, but I'm not sure how I can set the batches to start where the previous left off.
是有什么办法可以过滤我的扫描?据我了解,过滤是在加载后发生的,加载在1mb处停止,因此我实际上无法扫描新对象。
Is there some way to filter my scan? From what I read that filtering occurs after loading and the loading stops at 1mb so I wouldn't actually be able to scan in new objects.
任何帮助将不胜感激。
import boto3
dynamodb = boto3.resource('dynamodb',
aws_session_token = aws_session_token,
aws_access_key_id = aws_access_key_id,
aws_secret_access_key = aws_secret_access_key,
region_name = region
)
table = dynamodb.Table('widgetsTableName')
data = table.scan()
推荐答案
我认为有关表扫描的Amazon DynamoDB文档回答了您的问题。
I think the Amazon DynamoDB documentation regarding table scanning answers your question.
总之,您'需要在响应中检查 LastEvaluatedKey
。这是使用您的代码的示例:
In short, you'll need to check for LastEvaluatedKey
in the response. Here is an example using your code:
import boto3
dynamodb = boto3.resource('dynamodb',
aws_session_token=aws_session_token,
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key,
region_name=region
)
table = dynamodb.Table('widgetsTableName')
response = table.scan()
data = response['Items']
while 'LastEvaluatedKey' in response:
response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
data.extend(response['Items'])
这篇关于使用Boto3完全扫描dynamoDb的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!