将包含动态列的DynamoDB表备份到S3 [英] Backup DynamoDB Table with dynamic columns to S3

查看:267
本文介绍了将包含动态列的DynamoDB表备份到S3的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已阅读其他几篇关于此的帖子,特别是问题 greg回答如何在Hive中做到这一点。我想知道如何考虑具有可变数量列的DynamoDB表?



也就是说,原始DynamoDB表具有使用不同列动态添加的行。我曾尝试查看Amazon在其DataPipeLine服务中使用的exportDynamoDBToS3脚本,但其代码如下所示,似乎没有映射列:

   - 映射DynamoDB表
CREATE EXTERNAL TABLE dynamodb_table(item map< string,string>)
STORED BY'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES(dynamodb.table.name=MyTable);

(另外,我也尝试过使用Datapipe系统,但是发现它很让人沮丧,因为我可以不是从文档中找出如何执行简单的任务,比如运行一个shell脚本而不会失败)。我在原始问题中发布的Hive脚本工作得很好,但前提是您使用的是正确版本的Hive。看来,即使使用install-hive命令设置为安装最新版本,所使用的版本实际上依赖于AMI版本。



在进行了一点搜索之后我设法在Amazon的文档中找到以下内容(重点介绍我的文档):
$ b


创建一个引用存储在Amazon DynamoDB中的数据的Hive表。这与前面的示例中的
类似,不同之处在于您没有指定列映射。表
必须只有一列类型映射。如果您在Amazon S3中创建EXTERNAL
表,则可以调用INSERT OVERWRITE命令将数据从
Amazon DynamoDB写入Amazon S3。您可以使用它在Amazon S3中创建Amazon
DynamoDB数据的存档。由于没有列映射,因此无法查询以这种方式导出的表
。在 Hive 0.8.1.5或更高版本 Amazon EMR AMI 2.2.3及更高版本支持 中提供的未指定列映射的情况下导出数据, p>

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/EMR_Hive_Commands.html


I have read several other posts about this and in particular this question with an answer by greg about how to do it in Hive. I would like to know how to account for DynamoDB tables with variable amounts of columns though?

That is, the original DynamoDB table has rows that were added dynamically with different columns. I have tried to view the exportDynamoDBToS3 script that Amazon uses in their DataPipeLine service but it has code like the following which does not seem to map the columns:

-- Map DynamoDB Table
CREATE EXTERNAL TABLE dynamodb_table (item map<string,string>)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "MyTable");

(As an aside, I have also tried using the Datapipe system but found it rather frustrating as I could not figure out from the documentation how to perform simple tasks like run a shell script without everything failing.)

解决方案

It turns out that the Hive script that I posted in the original question works just fine but only if you are using the correct version of Hive. It seems that even with the install-hive command set to install the latest version, the version used is actually dependent on the AMI Version.

After doing a fair bit of searching I managed to find the following in Amazon's docs (emphasis mine):

Create a Hive table that references data stored in Amazon DynamoDB. This is similar to the preceding example, except that you are not specifying a column mapping. The table must have exactly one column of type map. If you then create an EXTERNAL table in Amazon S3 you can call the INSERT OVERWRITE command to write the data from Amazon DynamoDB to Amazon S3. You can use this to create an archive of your Amazon DynamoDB data in Amazon S3. Because there is no column mapping, you cannot query tables that are exported this way. Exporting data without specifying a column mapping is available in Hive 0.8.1.5 or later, which is supported on Amazon EMR AMI 2.2.3 and later.

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/EMR_Hive_Commands.html

这篇关于将包含动态列的DynamoDB表备份到S3的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆