避免在S3中使用hadoop(EMR)创建_ $ folder $键 [英] Avoid creation of _$folder$ keys in S3 with hadoop (EMR)

查看：256 发布时间：2020/8/23 2:24:38 amazon-web-services hadoop amazon-s3 amazon-emr

本文介绍了避免在S3中使用hadoop(EMR)创建_ $ folder $键的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在AWS数据管道中使用EMR活动.此EMR活动在EMR群集中运行配置单元脚本.它以dynamo DB作为输入并将数据存储在S3中.

I am using an EMR Activity in AWS data pipeline. This EMR Activity is running a hive script in EMR Cluster. It takes dynamo DB as input and stores data in S3.

这是EMR活动中使用的EMR步骤

This is the EMR step used in EMR Activity

s3://elasticmapreduce/libs/script-runner/script-runner.jar,s3://elasticmapreduce/libs/hive/hive-script,--run-hive-script,--hive-versions,latest,--args,-f,s3://my-s3-bucket/hive/my_hive_script.q,-d,DYNAMODB_INPUT_TABLE1=MyTable,-d,S3_OUTPUT_BUCKET=#{output.directoryPath}

其中

out.direcoryPath是:

out.direcoryPath is :

s3://my-s3-bucket/output/#{format(@scheduledStartTime,"YYYY-MM-dd")}

因此，这将在S3中创建一个文件夹和一个文件. (从技术上讲，它会创建两个键2017-03-18/<some_random_number>和2017-03-18_$folder$)

So this creates one folder and one file in S3. (technically speaking it creates two keys 2017-03-18/<some_random_number> and 2017-03-18_$folder$)

2017-03-18
2017-03-18_$folder$

如何避免创建这些额外的空_$folder$文件.

How to avoid creation of these extra empty _$folder$ files.

我在 https://issues.apache.org/jira/browse/HADOOP中找到了解决方案-10400 ，但我不知道如何在AWS数据管道中实现它.

I found a solution listed at https://issues.apache.org/jira/browse/HADOOP-10400 but I don't know how to implement it in AWS data pipeline.

避免在S3中使用hadoop(EMR)创建_ $ folder $键 [英] Avoid creation of _$folder$ keys in S3 with hadoop (EMR)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

避免在S3中使用hadoop(EMR)创建_ $ folder $键 [英] Avoid creation of _$folder$ keys in S3 with hadoop (EMR)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭