IllegalArgumentException,在指定s3而不是hdfs的输入/输出时错误的FS [英] IllegalArgumentException, Wrong FS when specifying input/output from s3 instead of hdfs
问题描述
我一直在一个本地集群上运行我的Spark作业,这个集群有一个hdfs,从那里读取输入并且输出也被写入。现在我已经建立了一个AWS EMR和一个S3存储桶,我有我的输入,我也希望我的输出也写入S3。
错误:
用户类引发异常:java.lang.IllegalArgumentException:错误
FS:s3:// something / input,expected:
hdfs://ip-some-numbers.eu-west-1.compute.internal:8020
我试过寻找同样的问题,并有关于这个问题的几个问题。有些人建议它仅用于输出,但即使禁用输出,我也会得到相同的错误。另外一个建议是, 第一次出现在我的自定义 在我的自定义 在 最后,当我想要检测我使用的文件夹中的文件数时: 我假设问题出在我的代码上,但是如何修改 hadoop文件系统apis不支持S3盒子外面。 S3的hadoop文件系统apis有两种实现:S3A和S3N。 S3A似乎是首选的实现。要使用它,你必须做一些事情: 在FileSystem配置中为以下属性创建FileSystem包含值时: 注意:请先创建一个简单用户并尝试使用基本身份验证。可以让它与AWS的更高级的临时凭证机制一起工作,但这有点牵扯,我不得不对FileSystem代码进行一些更改,以便在尝试时使其工作。 信息来源是这里 I have been running my Spark job on a local cluster which has hdfs from where the input is read and the output is written too. Now I have set up an AWS EMR and an S3 bucket where I have my input and I want my output to be written to S3 too. The error: User class threw exception: java.lang.IllegalArgumentException: Wrong
FS: s3://something/input, expected:
hdfs://ip-some-numbers.eu-west-1.compute.internal:8020 I tried searching for the same issue and there are several questions regarding this issue. Some suggested that it's only for the output, but even when I disable output I get the same error. Another suggestion is that there is something wrong with The first occurance is in my custom Similar case in my custom In And finally when I want to detect the number of files in a folder I use: I assume the issue is with my code, but how can I modify the The hadoop filesystem apis do not provide support for S3 out of the box. There are two implementations of the hadoop filesystem apis for S3: S3A, and S3N. S3A seems to be the preferred implementation. To use it you have to do a few things: When you create the FileSystem include values for the following properties in the FileSystem's configuration: Note: create a simple user and try things out with basic authentication first. It is possible to get it to work with AWS's more advanced temporary credential mechanisms, but it's a bit involved and I had to make some changes to the FileSystem code in order to get it to work when I tried. Source of info is here 这篇关于IllegalArgumentException,在指定s3而不是hdfs的输入/输出时错误的FS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
FileInputFormat
,在 getSplits(JobContext job)
中,我没有真正修改过自己,但我可以:
FileSystem fs = path.getFileSystem(job.getConfiguration());
RecordReader
中也有类似的情况,没有修改自己:
pre $ final FileSystem fs = file.getFileSystem(job);
nextKeyValue()
code> RecordReader 这是我自己编写的:
FileSystem fs = FileSystem。得到(JC);
val fs = FileSystem.get(sc.hadoopConfiguration)
val status = fs.listStatus(new Path(path))
FileSystem
调用来支持S3的输入/输出?
fs.s3a.access.key
fs.s3a.secret.key
s3://
使用 s3a://
来代替。 b $ b
FileSystem
in my code. Here are all of the occurances of input/output in my program:FileInputFormat
, in getSplits(JobContext job)
which I have not actually modified myself but I can:FileSystem fs = path.getFileSystem(job.getConfiguration());
RecordReader
, also have not modified myself:final FileSystem fs = file.getFileSystem(job);
nextKeyValue()
of my custom RecordReader
which I have written myself I use:FileSystem fs = FileSystem.get(jc);
val fs = FileSystem.get(sc.hadoopConfiguration)
val status = fs.listStatus(new Path(path))
FileSystem
calls to support input/output from S3?
fs.s3a.access.key
fs.s3a.secret.key
s3://
use s3a://
instead.