如何防止`hadoop fs rmr< uri>`创建$ folder $文件? [英] How do I prevent `hadoop fs rmr <uri>` from creating $folder$ files?

查看:161
本文介绍了如何防止`hadoop fs rmr< uri>`创建$ folder $文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在使用Amazon的Elastic Map Reduce来执行一些大型文件处理作业。作为我们工作流程的一部分,我们偶尔需要从S3中删除可能已经存在的文件。我们使用hadoop fs接口,如下所示:

  hadoop fs -rmr s3:// mybucket / a / b / myfile.log 

这会从S3中适当地移除文件,但在它的位置留下一个名为 S3:// mybucket / A / B_ $文件夹$。正如这个问题,Hadoop的Pig无法处理这些文件,因此稍后工作流中的步骤可能会阻塞该文件。



(注意,它似乎并不重要无论我们使用 -rmr 还是 -rm 或者我们是否使用 s3:// code>或 s3n:// 作为方案:所有这些都表现出所描述的行为。)



如何使用 hadoop fs 界面从S3中删除文件,并确保不会留下这些麻烦的文件?

解决方案

我无法弄清楚是否有可能以这种方式使用hadoop fs接口。然而,s3cmd接口做的是正确的事情(但一次只能用一个键):

  s3cmd del s3:// mybucket / a / b / myfile.log 

这需要使用AWS配置〜/ .s3cfg文件凭证第一。 s3cmd --configure 将交互式地帮助您创建该文件。


We're using Amazon's Elastic Map Reduce to perform some large file processing jobs. As a part of our workflow, we occasionally need to remove files from S3 that may already exist. We do so using the hadoop fs interface, like this:

hadoop fs -rmr s3://mybucket/a/b/myfile.log

This removes the file from S3 appropriately, but in it's place leaves an empty file named "s3://mybucket/a/b_$folder$". As described in this question, Hadoop's Pig is unable to handle these files, so later steps in the workflow can choke on this file.

(Note, it doesn't seem to matter whether we use -rmr or -rm or whether we use s3:// or s3n:// as the scheme: all of these exhibit the described behavior.)

How do I use the hadoop fs interface to remove files from S3 and be sure not to leave these troublesome files behind?

解决方案

I wasn't able to figure out if it's possible to use the hadoop fs interface in this way. However, the s3cmd interface does the right thing (but only for one key at a time):

s3cmd del s3://mybucket/a/b/myfile.log

This requires configuring a ~/.s3cfg file with your AWS credentials first. s3cmd --configure will interactively help you create this file.

这篇关于如何防止`hadoop fs rmr< uri>`创建$ folder $文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆