获取可用于读写HDFS的Hadoop FileSystem对象的正确方法是什么? [英] What is the correct way to get a Hadoop FileSystem object that can be used for reading from/writing to HDFS?
问题描述
创建可用于读写HDFS的FileSystem对象的正确方法是什么?在我发现的一些例子中,他们做了这样的事情:
final配置conf = new Configuration();
conf.addResource(新路径(/ usr / local / hadoop / etc / hadoop / core-site.xml));
conf.addResource(new Path(/ usr / local / hadoop / etc / hadoop / hdfs-site.xml));
final FileSystem fs = FileSystem.get(conf);
通过查看Configuration类的文档,它看起来像来自core-site.xml的属性如果该文件位于类路径中,则在创建该对象时自动加载,因此不需要再次设置。
我没有发现任何说明为什么添加hdfs-site.xml将是必需的,它似乎工作正常,没有它。
只要将core-site.xml放在类路径中,跳过hdfs-site.xml,还是应该像我在例子中看到的那样设置?在什么情况下,hdfs-site.xml中的属性是必需的?
FileSystem
只需要一个配置密钥即可成功连接到HDFS。以前它是 fs.default.name
。从纱线
之后,它被更改为 fs.defaultFS
。所以下面的代码片段就足够了。
配置conf = new Configuration();
conf.set(key,hdfs:// host:port); //其中key =fs.default.name|fs.defaultFS
FileSystem fs = FileSystem.get(conf);
提示:检查核心站点。 xml
哪个键存在。在 conf
中设置与它相关的相同值。如果您运行代码的机器没有主机名称映射,请输入其IP。在 mapR
群集值中,前缀类似于 maprfs://
。
What is the correct way to create a FileSystem object that can be used for reading from/writing to HDFS? In some examples I've found, they do something like this:
final Configuration conf = new Configuration();
conf.addResource(new Path("/usr/local/hadoop/etc/hadoop/core-site.xml"));
conf.addResource(new Path("/usr/local/hadoop/etc/hadoop/hdfs-site.xml"));
final FileSystem fs = FileSystem.get(conf);
From looking at the documentation for the Configuration class, it looks like the properties from core-site.xml are automatically loaded when the object is created if that file is on the classpath, so there is no need to set it again.
I haven't found anything that says why adding hdfs-site.xml would be required, and it seems to work fine without it.
Would it be safe to just put core-site.xml on the classpath and skip hdfs-site.xml, or should I be setting both like I've seen in the examples? In what cases would the properties from hdfs-site.xml be required?
FileSystem
needs only one configuration key to successfully connect to HDFS. Previously it was fs.default.name
. From yarn
onward it's changed to fs.defaultFS
. So the following snippet is sufficient for the connection.
Configuration conf = new Configuration();
conf.set(key, "hdfs://host:port"); // where key="fs.default.name"|"fs.defaultFS"
FileSystem fs = FileSystem.get(conf);
Tip : Check the core-site.xml
which key exists. Set the same value associated with it in conf
. If the machine from where you are running the code doesn't have the host name mapping, put the its IP. In mapR
cluster value will have prefix like maprfs://
.
这篇关于获取可用于读写HDFS的Hadoop FileSystem对象的正确方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!