避免使用 hadoop (EMR) 在 S3 中创建 _$folder$ 键 [英] Avoid creation of _$folder$ keys in S3 with hadoop (EMR)

查看：56 发布时间：2021/11/27 10:07:33 amazon-web-services hadoop amazon-s3 amazon-emr

本文介绍了避免使用 hadoop (EMR) 在 S3 中创建 _$folder$ 键的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在 AWS 数据管道中使用 EMR 活动.此 EMR 活动正在 EMR 集群中运行 hive 脚本.它以 dynamo DB 作为输入并将数据存储在 S3 中.

I am using an EMR Activity in AWS data pipeline. This EMR Activity is running a hive script in EMR Cluster. It takes dynamo DB as input and stores data in S3.

这是 EMR 活动中使用的 EMR 步骤

This is the EMR step used in EMR Activity

s3://elasticmapreduce/libs/script-runner/script-runner.jar,s3://elasticmapreduce/libs/hive/hive-script,--run-hive-script,--hive-versions,latest,--args,-f,s3://my-s3-bucket/hive/my_hive_script.q,-d,DYNAMODB_INPUT_TABLE1=MyTable,-d,S3_OUTPUT_BUCKET=#{output.directoryPath}

哪里

out.direcoryPath 是:

out.direcoryPath is :

s3://my-s3-bucket/output/#{format(@scheduledStartTime,"YYYY-MM-dd")}

所以这会在 S3 中创建一个文件夹和一个文件.(从技术上讲，它创建了两个密钥 2017-03-18/ 和 2017-03-18_$folder$)

So this creates one folder and one file in S3. (technically speaking it creates two keys 2017-03-18/<some_random_number> and 2017-03-18_$folder$)

2017-03-18
2017-03-18_$folder$

如何避免创建这些额外的空 _$folder$ 文件.

How to avoid creation of these extra empty _$folder$ files.

我在 https://issues.apache.org/jira/browse/HADOOP 中找到了一个解决方案-10400 但我不知道如何在 AWS 数据管道中实现它.

I found a solution listed at https://issues.apache.org/jira/browse/HADOOP-10400 but I don't know how to implement it in AWS data pipeline.

避免使用 hadoop (EMR) 在 S3 中创建 _$folder$ 键 [英] Avoid creation of _$folder$ keys in S3 with hadoop (EMR)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

避免使用 hadoop (EMR) 在 S3 中创建 _$folder$ 键 [英] Avoid creation of _$folder$ keys in S3 with hadoop (EMR)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭