如何通过在Hadoop中包含H / A namenode的URI来访问hdfs,这是外部hadoop集群? [英] How to access hdfs by URI consisting of H/A namenodes in Spark which is outer hadoop cluster?

查看:144
本文介绍了如何通过在Hadoop中包含H / A namenode的URI来访问hdfs,这是外部hadoop集群?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



由于我们的hadoop集群包含namenode H / A,并且spark集群位于hadoop集群之外(我知道这是坏事)我需要指定HDFS URI到应用程序,以便它可以访问HDFS。



但它不识别名称服务,所以我只能给一个namenode的URI,如果失败,修改配置文件并重试。



访问Zookeeper以显示活动似乎非常烦人,所以我想避免。



假设你的名字服务是'hadooptest',然后设置如下的hadoop配置。
您可以从支持远程HA的HDFS的hdfs-site.xml文件中获取这些信息。

  sc.hadoopConfiguration。 set(dfs.nameservices,hadooptest)
sc.hadoopConfiguration.set(dfs.client.failover.proxy.provider.hadooptest,org.apache.hadoop.hdfs.server.namenode.ha .ConfiguredFailoverProxyProvider)
sc.hadoopConfiguration.set(dfs.ha.namenodes.hadooptest,nn1,nn2)
sc.hadoopConfiguration.set(dfs.namenode.rpc-address.hadooptest .nn1,10.10.14.81:8020)
sc.hadoopConfiguration.set(dfs.namenode.rpc-address.hadooptest.nn2,10.10.14.82:8020)

在此之后,您可以像下面那样使用带有'hadooptest'的网址。

  test.write.orc(hdfs:// hadooptest / tmp / test / r1)

检查在这里获取更多信息。


Now I have some Spark applications which store output to HDFS.

Since our hadoop cluster is consisting of namenode H/A, and spark cluster is outside of hadoop cluster (I know it is something bad) I need to specify HDFS URI to application so that it can access HDFS.

But it doesn't recognize name service so I can only give one of namenode's URI, and if it fails, modify configuration file and try again.

Accessing Zookeeper for revealing active seems to very annoying, so I'd like to avoid.

Could you suggest any alternatives?

解决方案

Suppose your nameservice is 'hadooptest', then set the hadoop configurations like below. You can get these information from hdfs-site.xml file of remote HA enabled HDFS.

sc.hadoopConfiguration.set("dfs.nameservices", "hadooptest")
sc.hadoopConfiguration.set("dfs.client.failover.proxy.provider.hadooptest", "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider")
sc.hadoopConfiguration.set("dfs.ha.namenodes.hadooptest", "nn1,nn2")
sc.hadoopConfiguration.set("dfs.namenode.rpc-address.hadooptest.nn1", "10.10.14.81:8020")
sc.hadoopConfiguration.set("dfs.namenode.rpc-address.hadooptest.nn2", "10.10.14.82:8020")

After this, you can use the URL with 'hadooptest' like below.

test.write.orc("hdfs://hadooptest/tmp/test/r1")

check here for more information.

这篇关于如何通过在Hadoop中包含H / A namenode的URI来访问hdfs,这是外部hadoop集群?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆