Namenode HA(UnknownHostException:nameservice1) [英] Namenode HA (UnknownHostException: nameservice1)

查看:1920
本文介绍了Namenode HA(UnknownHostException:nameservice1)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们通过Cloudera Manager启用Namenode High Availability,使用



Cloudera Manager >> HDFS >>操作>启用高可用性>>选定支持者Namenode&期刊节点
然后nameservice1



完成整个过程后,部署客户端配置。

测试自客户机通过列出HDFS目录(hadoop fs -ls /)然后手动故障切换到备用名称节点&再次列出HDFS目录(hadoop fs -ls /)。这个测试完美运行。



但是当我使用下面的命令运行hadoop睡眠作业时,失败

  $ hadoop jar /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop-0.20-mapreduce/hadoop-examples.jar sleep -m 1  - r 0的
java.lang.IllegalArgumentException异常:的java.net.UnknownHostException:在org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:414)nameservice1

。在org.apache .hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:164)
在org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129)
在org.apache.hadoop.hdfs .DFSClient。< init>(DFSClient.java:448)
at org.apache.hadoop.hdfs.DFSClient。< init>(DFSClient.java:410)
at org.apache.hadoop .hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:128)
在org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2308)
在org.apache.hadoop.fs.FileSystem 。访问$ 200(FileSystem.ja va:87)
at org.apache.hadoop.fs.FileSystem $ Cache.getInternal(FileSystem.java:2342)
at org.apache.hadoop.fs.FileSystem $ Cache.get(FileSystem。
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:351)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:194)在org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:103)

。在org.apache.hadoop.mapred.JobClient $ 2.run(JobClient.java:980)
在org.apache.hadoop.mapred.JobClient $ 2.run(JobClient.java:974)$ b $在java.security.AccessController.doPrivileged(本地方法)$ b $在javax.security.auth.Subject。 doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient。 Java的:在org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:948 974)

在org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1410)
在org.apache.hadoop.example s.SleepJob.run(SleepJob.java:174)
处org.apache.hadoop.util.ToolRunner org.apache.hadoop.examples.SleepJob.run(SleepJob.java:237)
。运行(ToolRunner.java:70)
在org.apache.hadoop.examples.SleepJob.main(SleepJob.java:165)
在sun.reflect.NativeMethodAccessorImpl.invoke0(本机方法)
。在sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
在java.lang.reflect.Method.invoke( Method.java:622)
at org.apache.hadoop.util.ProgramDriver $ ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver。 Java的:在太阳sun.reflect.NativeMethodAccessorImpl.invoke0在org.apache.hadoop.examples.ExampleDriver.main 144)
(ExampleDriver.java:64)
(本机方法)
。 reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Del egatingMethodAccessorImpl.java:43)美元,java.lang.reflect.Method.invoke(Method.java:622 B $ B)
在org.apache.hadoop.util.RunJar.main(RunJar.java:208)
导致:java.net.UnknownHostException:nameservice1
... 37 more

我不知道为什么在部署客户端配置后无法解析nameservice1。



当我谷歌这个问题时,我发现这个问题只有一个解决方案



在配置条目中添加以下条目以解决问题
dfs.nameservices = nameservice1
dfs.ha.namenodes.nameservice1 = namenode1,namenode2
dfs.namenode.rpc-address.nameservice1.namenode1 = ip-10-118-137-215.ec2.internal:8020
dfs.namenode.rpc-address.nameservice1.namenode2 = ip-10-12-122 -210.ec2.internal:8020
dfs.client.failover.proxy.provider.nameservice1 = org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

我的印象是Cloudera Manager负责管理它。我检查了客户端的配置和& (/var/run/cloudera-scm-agent/process/1998-deploy-client-config/hadoop-conf/hdfs-site.xml)。



还有一些配置文件的细节:

  [11:22:37 root@datasci01.dev:〜]#ls  - l /etc/hadoop/conf.cloudera.* 
/etc/hadoop/conf.cloudera.hdfs:
total 16
-rw -r - r-- 1 root root 943 Jul 31 09:33 core-site.xml
-rw -r - r-- 1 root root 2546 Jul 31 09:33 hadoop-env.sh
-rw -r - r-- 1 root root 1577 Jul 31 09:33 hdfs-site.xml
-rw -r - r-- 1 root root 314 Jul 31 09:33 log4j.properties

/ etc / hadoop /conf.cloudera.hdfs1:
共20
-rwxr-xr-x 1根root 233 2013年9月5日container-executor.cfg
-rw -r - r-- 1 root root 1890 5月21日15:48 core-site.xml
-rw -r - r-- 1 root root 2546 May 21 15:48 hadoop-env.sh
-rw -r - r - 1根root 1577 5月21日15:48 hdfs-site.xml
-rw -r - r-- 1根root 314 5月21日15:48 log4j.properties

/ etc / hadoop / conf.cloudera.mapreduce:
共20
- rw-r - r-- 1 root root 1032 Jul 31 09:33 core-site.xml
-rw -r - r-- 1 root root 2775 Jul 31 09:33 hadoop-env.sh
-rw-r - r-- 1 root root 1450 Jul 31 09:33 hdfs-site.xml
-rw -r - r-- 1 root root 314 Jul 31 09:33 log4j。属性
-rw-r - r-- 1根root 2446 Jul 31 09:33 mapred-site.xml

/etc/hadoop/conf.cloudera.mapreduce1:
共24
-rwxr-xr-x 1根root 233 2013年9月5日container-executor.cfg
-rw -r - r-- 1根root 1979 5月16日12:20核心站点.xml
-rw -r - r-- 1 root root 2775 5月16日12:20 hadoop-env.sh
-rw -r - r-- 1 root root 1450 5月16日12: 20 hdfs-site.xml
-rw -r - r-- 1 root root 314 5月16日12:20 log4j.properties
-rw -r - r-- 1 root root 2446 May 16 12:20 mapred-site.xml
[11:23:12 root@datasci01.dev:〜]#

我怀疑它在旧配置中的问题,在/etc/hadoop/conf.cloudera.hdfs1& /etc/hadoop/conf.cloudera.mapreduce1,但不确定。



看起来像/ etc / hadoop / conf / *永远不会更新

 #ls -l / etc / hadoop / conf / 
共24
-rwxr-xr-x 1 root root 233 Sep 5 2013 container-executor.cfg
-rw -r - r-- 1 root root 1979 5月16日12:20 core-site.xml
-rw -r - r-- 1 root root 2775 5月16日12:20 hadoop-env.sh
-rw-r - r-- 1 root root 1450 5月16日12:20 hdfs-site.xml
-rw-r - r-- 1 root root 314 5月16日12:20 log4j.properties
-rw -r - r-- 1 root root 2446 5月16日12:20 mapred-site.xml

p>

任何人都对这个问题有任何想法?



谢谢


看起来你在/ etc / hadoop / conf目录中使用了错误的客户端配置。有时Cloudera Manager(CM)部署客户端配置选项可能不起作用。

由于您启用了NN HA,您应该在hadoop客户端配置目录中拥有有效的core-site.xml和hdfs-site.xml文件。要获取有效的站点文件,请转到CM的HDFS服务从操作按钮中选择下载客户端配置选项。您将以zip格式获取配置文件,解压zip文件,并用提取的core-site.xml替换/etc/hadoop/conf/core-site.xml和/etc/hadoop/conf/hdfs-site.xml文件, hdfs-site.xml文件。


We enable Namenode High Availability through Cloudera Manager, using

Cloudera Manager >> HDFS >> Action > Enable High Availability >> Selected Stand By Namenode & Journal Nodes Then nameservice1

Once the whole process completed then Deployed Client Configuration.

Tested from Client Machine by listing HDFS directories (hadoop fs -ls /) then manually failover to standby namenode & again listing HDFS directories (hadoop fs -ls /). This test worked perfectly.

But When I ran hadoop sleep job using following command it failed

$ hadoop jar /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop-0.20-mapreduce/hadoop-examples.jar sleep -m 1 -r 0
java.lang.IllegalArgumentException: java.net.UnknownHostException: nameservice1
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:414)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:164)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:448)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:410)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:128)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2308)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:87)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2342)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2324)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:351)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:194)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:103)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:980)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:974)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:974)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:948)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1410)
at org.apache.hadoop.examples.SleepJob.run(SleepJob.java:174)
at org.apache.hadoop.examples.SleepJob.run(SleepJob.java:237)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.examples.SleepJob.main(SleepJob.java:165)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.net.UnknownHostException: nameservice1
... 37 more

I dont know why its not able to resolved nameservice1 even after deploying client configuration.

When I google this issue I found only one solution to this issue

Add the below entry in configuration entry for fix the issue dfs.nameservices=nameservice1 dfs.ha.namenodes.nameservice1=namenode1,namenode2 dfs.namenode.rpc-address.nameservice1.namenode1=ip-10-118-137-215.ec2.internal:8020 dfs.namenode.rpc-address.nameservice1.namenode2=ip-10-12-122-210.ec2.internal:8020 dfs.client.failover.proxy.provider.nameservice1=org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

My impression was Cloudera Manager take cares of it. I checked client for this configuration & configuration was there (/var/run/cloudera-scm-agent/process/1998-deploy-client-config/hadoop-conf/hdfs-site.xml).

Also some more details of config files :

[11:22:37 root@datasci01.dev:~]# ls -l /etc/hadoop/conf.cloudera.*
/etc/hadoop/conf.cloudera.hdfs:
total 16
-rw-r--r-- 1 root root  943 Jul 31 09:33 core-site.xml
-rw-r--r-- 1 root root 2546 Jul 31 09:33 hadoop-env.sh
-rw-r--r-- 1 root root 1577 Jul 31 09:33 hdfs-site.xml
-rw-r--r-- 1 root root  314 Jul 31 09:33 log4j.properties

/etc/hadoop/conf.cloudera.hdfs1:
total 20
-rwxr-xr-x 1 root root  233 Sep  5  2013 container-executor.cfg
-rw-r--r-- 1 root root 1890 May 21 15:48 core-site.xml
-rw-r--r-- 1 root root 2546 May 21 15:48 hadoop-env.sh
-rw-r--r-- 1 root root 1577 May 21 15:48 hdfs-site.xml
-rw-r--r-- 1 root root  314 May 21 15:48 log4j.properties

/etc/hadoop/conf.cloudera.mapreduce:
total 20
-rw-r--r-- 1 root root 1032 Jul 31 09:33 core-site.xml
-rw-r--r-- 1 root root 2775 Jul 31 09:33 hadoop-env.sh
-rw-r--r-- 1 root root 1450 Jul 31 09:33 hdfs-site.xml
-rw-r--r-- 1 root root  314 Jul 31 09:33 log4j.properties
-rw-r--r-- 1 root root 2446 Jul 31 09:33 mapred-site.xml

 /etc/hadoop/conf.cloudera.mapreduce1:
total 24
-rwxr-xr-x 1 root root  233 Sep  5  2013 container-executor.cfg
-rw-r--r-- 1 root root 1979 May 16 12:20 core-site.xml
-rw-r--r-- 1 root root 2775 May 16 12:20 hadoop-env.sh
-rw-r--r-- 1 root root 1450 May 16 12:20 hdfs-site.xml
-rw-r--r-- 1 root root  314 May 16 12:20 log4j.properties
-rw-r--r-- 1 root root 2446 May 16 12:20 mapred-site.xml
[11:23:12 root@datasci01.dev:~]# 

I doubt its issue with old configuration in /etc/hadoop/conf.cloudera.hdfs1 & /etc/hadoop/conf.cloudera.mapreduce1 , but not sure.

looks like /etc/hadoop/conf/* never got updated

# ls -l /etc/hadoop/conf/
total 24
-rwxr-xr-x 1 root root  233 Sep  5  2013 container-executor.cfg
-rw-r--r-- 1 root root 1979 May 16 12:20 core-site.xml
-rw-r--r-- 1 root root 2775 May 16 12:20 hadoop-env.sh
-rw-r--r-- 1 root root 1450 May 16 12:20 hdfs-site.xml
-rw-r--r-- 1 root root  314 May 16 12:20 log4j.properties
-rw-r--r-- 1 root root 2446 May 16 12:20 mapred-site.xml

Anyone has any idea about this issue ?

Thanks

解决方案

Looks like you are using wrong client configuration in /etc/hadoop/conf directory. Sometimes Cloudera Manager (CM) deploy client configurations option may not work.

As you have enabled NN HA, you should have valid core-site.xml and hdfs-site.xml files in your hadoop client configuration directory. For getting the valid site files, Go to HDFS service from CM Choose Download client configuration option from the Actions Button. you will get configuration files in zip format, extract the zip files and replace /etc/hadoop/conf/core-site.xml and /etc/hadoop/conf/hdfs-site.xml files with the extracted core-site.xml,hdfs-site.xml files.

这篇关于Namenode HA(UnknownHostException:nameservice1)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆