使用亚马逊数据管道将dynamoDB数据备份到S3 [英] Using amazon data pipeline to backup dynamoDB data to S3

查看:202
本文介绍了使用亚马逊数据管道将dynamoDB数据备份到S3的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用亚马逊数据管道将dynamoDB表数据备份到S3。

I need to backup my dynamoDB table data to S3 using amazon Data pipeline.

我的问题是-我可以使用单个数据管道将多个dynamoDB表备份到吗? S3,还是我必须为它们每个创建单独的管道??

My question is- Can i use a single data pipeline to backup multiple dynamoDB tables to S3, or do I have to make a separate pipeline for each of them??

此外,由于我的表具有year_month前缀(ex- 2014_3_tableName),所以我在想每月一次更改后,使用datapipeline SDK更改管道定义中的表名称的方法。有替代/更好的方式吗?

Also, since my tables have a year_month prefix( ex- 2014_3_tableName) , I was thinking of using datapipeline SDK to change the table name in pipeline definition once the month changes.Will this work? Is there an alternate/better way??

谢谢!

推荐答案

如果要通过DynamoDB控制台的导入/导出按钮设置数据管道,则必须为每个表创建一个单独的管道。如果直接使用数据管道(通过数据管道API或通过数据管道控制台),则可以在同一管道中导出多个表。对于每个表,只需添加一个附加的DynamoDBDataNode和一个EmrActivity即可将该数据节点链接到输出S3DataNode。

If you are setting up your Data Pipeline through the DynamoDB Console's Import/Export button, you will have to create a separate pipeline per table. If you are using Data Pipeline directly (either through the Data Pipeline API or through the Data Pipeline console), you can export multiple tables in the same pipeline. For each table, simply add an additional DynamoDBDataNode, and an EmrActivity to link that Data Node to the output S3DataNode.

关于您的year_month前缀用例,请使用数据管道sdk定期更改表名似乎是最好的方法。另一种方法可能是复制正在运行导出EmrActivity的脚本的副本(您可以在活动的步骤下看到脚本位置),而是通过检查当前日期来更改配置单元脚本确定表名称的方式。您需要复制此脚本并将修改后的脚本托管在自己的S3存储桶中,然后将EmrActivity指向该位置而不是默认位置。我以前没有尝试过这两种方法,但是从理论上讲都是可行的。

Regarding your year_month prefix use case, using the data pipeline sdk to change the table names periodically seems like the best approach. Another approach could be to make a copy of the script that export EmrActivity is running (you can see the script location under the "step" of the activity), and instead change the way that the hive script determines the table name by checking the current date. You would need to make a copy of this script and host the modified script in your own S3 bucket, and point the EmrActivity to that location instead of the default. I have not tried either approach before, but both are theoretically possible.

有关导出DynamoDB表的更多常规信息可以在 DynamoDB开发人员指南,有关更多详细信息,请参见 AWS数据管道开发人员指南

More general information about exporting DynamoDB tables can be found in the DynamoDB Developer Guide, and more detailed information can be found in the AWS Data Pipeline developer guide.

这篇关于使用亚马逊数据管道将dynamoDB数据备份到S3的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆