如何防止`hadoop fs rmr< uri>`创建$ folder $文件? [英] How do I prevent `hadoop fs rmr <uri>` from creating $folder$ files?
问题描述
hadoop fs -rmr s3:// mybucket / a / b / myfile.log
这会从S3中适当地移除文件,但在它的位置留下一个名为 S3:// mybucket / A / B_ $文件夹$。正如这个问题,Hadoop的Pig无法处理这些文件,因此稍后工作流中的步骤可能会阻塞该文件。
(注意,它似乎并不重要无论我们使用 -rmr
还是 -rm
或者我们是否使用 s3:// code>或
s3n://
作为方案:所有这些都表现出所描述的行为。)
如何使用 hadoop fs
界面从S3中删除文件,并确保不会留下这些麻烦的文件?
我无法弄清楚是否有可能以这种方式使用hadoop fs接口。然而,s3cmd接口做的是正确的事情(但一次只能用一个键):
s3cmd del s3:// mybucket / a / b / myfile.log
这需要使用AWS配置〜/ .s3cfg文件凭证第一。 s3cmd --configure
将交互式地帮助您创建该文件。
We're using Amazon's Elastic Map Reduce to perform some large file processing jobs. As a part of our workflow, we occasionally need to remove files from S3 that may already exist. We do so using the hadoop fs interface, like this:
hadoop fs -rmr s3://mybucket/a/b/myfile.log
This removes the file from S3 appropriately, but in it's place leaves an empty file named "s3://mybucket/a/b_$folder$". As described in this question, Hadoop's Pig is unable to handle these files, so later steps in the workflow can choke on this file.
(Note, it doesn't seem to matter whether we use -rmr
or -rm
or whether we use s3://
or s3n://
as the scheme: all of these exhibit the described behavior.)
How do I use the hadoop fs
interface to remove files from S3 and be sure not to leave these troublesome files behind?
I wasn't able to figure out if it's possible to use the hadoop fs interface in this way. However, the s3cmd interface does the right thing (but only for one key at a time):
s3cmd del s3://mybucket/a/b/myfile.log
This requires configuring a ~/.s3cfg file with your AWS credentials first. s3cmd --configure
will interactively help you create this file.
这篇关于如何防止`hadoop fs rmr< uri>`创建$ folder $文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!