将数据从 DynamoDB 迁移到 S3 需要多长时间 [英] How much time would it take to migrate data from DynamoDB to S3

查看:18
本文介绍了将数据从 DynamoDB 迁移到 S3 需要多长时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用 AWS Data Pipeline 将数据从 DynamoDB 迁移到 S3.数据大小约为 20 GB.对此有什么想法吗?

I have been using AWS Data Pipeline to migrate data from DynamoDB to S3. The size of data is around 20 GB. Any thoughts on this ?

推荐答案

AWS DataPipeline 将整个 DynamoDB 表导出到 S3 中的一个文件.这个特定的数据管道模板将使用 百分比MyExportJob.myDynamoDBReadThroughputRatio 变量定义的表的预置容量,并将适当地扩展 MapReduce 作业集群.您可以将读取吞吐率设置为 0 到 1 (0%-100%).

AWS DataPipeline exports entire DynamoDB tables to one file in S3. This particular Data Pipeline template will use a percentage of your table's provisioned capacity as defined by the MyExportJob.myDynamoDBReadThroughputRatio variable, and will scale the MapReduce job cluster appropriately. You can set the read throughput ratio from 0 to 1 (0%-100%).

如果您有 20GB 的数据,并且数据管道 使用 MapReduce 并行扫描您的表,您将消耗 5242880 RCU.您希望备份花费多长时间取决于您.如果将读取吞吐量比设置为 1 并将 RPS 设置为 11988 RPS,则扫描 DynamoDB 表大约需要 5242880/11988 = 437 秒(4 分 17 秒).数据管道作业运行时间应该成比例并且非常接近扫描表所需的时间.请记住,Data Pipeline 必须启动一个集群并将备份写入 S3.

If you have 20GB of data, and Data Pipeline scans your table in parallel with MapReduce, you would consume 5242880 RCU. It is up to you how long you want the backup to take. If you set the read throughput ratio to 1 and have RPS set to 11988 RPS, scanning the DynamoDB table should take around 5242880 / 11988 = 437 seconds (4 minutes and 17 seconds). The Data Pipeline job runtime should be proportional and very close to the time needed to scan the table. Remember, Data Pipeline has to start up a cluster and write the backup to S3.

这篇关于将数据从 DynamoDB 迁移到 S3 需要多长时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆