将数据从DynamoDB迁移到S3需要多少时间 [英] How much time would it take to migrate data from DynamoDB to S3

查看:69
本文介绍了将数据从DynamoDB迁移到S3需要多少时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用AWS Data Pipeline将数据从DynamoDB迁移到S3.数据大小约为20 GB.有什么想法吗?

I have been using AWS Data Pipeline to migrate data from DynamoDB to S3. The size of data is around 20 GB. Any thoughts on this ?

推荐答案

AWS DataPipeline将整个DynamoDB表导出到您表的预配置容量(由 MyExportJob.myDynamoDBReadThroughputRatio 变量定义),并将适当地缩放MapReduce作业集群.您可以将读取吞吐率设置为0到1(0%-100%).

AWS DataPipeline exports entire DynamoDB tables to one file in S3. This particular Data Pipeline template will use a percentage of your table's provisioned capacity as defined by the MyExportJob.myDynamoDBReadThroughputRatio variable, and will scale the MapReduce job cluster appropriately. You can set the read throughput ratio from 0 to 1 (0%-100%).

如果您有20GB的数据,并且数据管道使用MapReduce并行扫描表,您将消耗5242880 RCU.由您决定要备份多长时间.如果将读取吞吐率设置为1并将RPS设置为11988 RPS,则扫描DynamoDB表应大约花费5242880/11988 = 437秒(4分17秒).数据管道作业运行时应成比例,并且非常接近扫描表所需的时间.记住,数据管道必须启动集群并将备份写入S3.

If you have 20GB of data, and Data Pipeline scans your table in parallel with MapReduce, you would consume 5242880 RCU. It is up to you how long you want the backup to take. If you set the read throughput ratio to 1 and have RPS set to 11988 RPS, scanning the DynamoDB table should take around 5242880 / 11988 = 437 seconds (4 minutes and 17 seconds). The Data Pipeline job runtime should be proportional and very close to the time needed to scan the table. Remember, Data Pipeline has to start up a cluster and write the backup to S3.

这篇关于将数据从DynamoDB迁移到S3需要多少时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆