Hadoop distcp在两个安全(kerberos)群集之间 [英] Hadoop distcp between two secured(kerberos) clusters
问题描述
我有两个Hadoop集群,两个都运行相同的Hadoop版本。我还在两个集群中都有一个用户testuser(示例)(因此testuser keytabs同时存在)。
Namenode# 1(源群集):hdfs:// nn1:8020
Namenode#2(目标群集):hdfs:// nn2:8020
我想使用hadoop distcp将一些文件从一个集群复制到另一个集群。示例:在源集群中,我有一个文件路径为/ user / testuser / temp / file-r-0000,在目标集群中,目标目录为/ user / testuser / dest /所以我想要的是将文件r-0000从源集群复制到目标集群的dest目录。
我已经尝试过这些:
hadoop distcp hdfs:// nn1:8020 / user / testuser / temp / file-r-0000 hdfs:// nn2:8020 / user / testuser / dest
hadoop distcp hftp:// nn1:8020 / user / testuser / temp / file-r-0000 hdfs:// nn2:8020 / user / testuser / dest
我相信我不需要使用hftp://,因为我有相同版本的hadoop。
当从hftp的目标集群运行时:
14/02/26 00:04:45 ERROR security.UserGroupInformation:PriviledgedActionException as:testuser @ realm cause:java.net.SocketException:Unexpected文件结束从服务器
14/02/26 00:04:45 ERROR security.UserGroupInformation:PriviledgedActionException as:testuser @ realm原因:java.net.SocketException:从服务器意外的文件结束
14 / 02/26 00:04:45 INFO fs.FileSystem:无法从nn1ipaddress获取委派令牌:8020
当从源群集运行时:
14/02/26 00:05:43 ERROR security.UserGroupInformation:PriviledgedActionException as :testuser @ realm1原因:java.io.IOException:无法为testuser @ realm1设置连接到nn / realm2
出现故障时,全局计数器不准确;考虑使用-i
运行复制失败:java.io.IOException:对本地异常调用nn1ipaddress失败:java.io.IOException:无法为testuser @ realm1至nn / realm2设置连接
导致:java.io.IOException:无法为testuser @ realm1设置连接到nn / realm2
at org.apache.hadoop.ipc.Client $ Connection $ 1.run(Client .java:560)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache .hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.ipc.Client $ Connection.handleSaslConnectionFailure(Client.java:513)
at org.apache.hadoop .ipc.Client $ Connection.setupIOstreams(Client.java:616)
在org.apache.hadoop.ipc.Client $ Connection.access $ 2100(Client.java:203)
在org.apache。 hadoop.ipc.Client.getConnection(Client.java:1254)
at org.apache.hadoop.ipc.Client.call(Client.java:1098)
... 26 more
它还显示在kerberos数据库中没有主机地址(我没有确切的日志) / p>
那么,我需要以不同的方式配置kerberos才能在它们之间使用discp?或者我在这里缺少的东西?
任何信息将受到高度赞赏。提前感谢。
解决方案需要跨域验证才能在两个安全集群之间使用distcp。它没有在这两个集群中配置。在正确设置跨域身份验证后,它工作。
I have two Hadoop clusters and both are running the same Hadoop version. I also have a user "testuser" (example) in both clusters (so testuser keytabs is present in both).
Namenode#1 (source cluster): hdfs://nn1:8020 Namenode#2 (dest cluster): hdfs://nn2:8020
I want to copy some files from one cluster to another using hadoop distcp. Example: in source cluster I have a file with path "/user/testuser/temp/file-r-0000" and in destination cluster, the destination directory is "/user/testuser/dest/". So what I want is to copy file-r-0000 from source cluster to target cluster's "dest" directory.
I have tried these so far:
hadoop distcp hdfs://nn1:8020/user/testuser/temp/file-r-0000 hdfs://nn2:8020/user/testuser/dest hadoop distcp hftp://nn1:8020/user/testuser/temp/file-r-0000 hdfs://nn2:8020/user/testuser/dest
I believe I do not need to use "hftp://" since I have same version of hadoop. Again, I also tried those in both cluster, but all I'm getting are some exceptions related to security.
When running from destination cluster with hftp:
14/02/26 00:04:45 ERROR security.UserGroupInformation: PriviledgedActionException as:testuser@realm cause:java.net.SocketException: Unexpected end of file from server 14/02/26 00:04:45 ERROR security.UserGroupInformation: PriviledgedActionException as:testuser@realm cause:java.net.SocketException: Unexpected end of file from server 14/02/26 00:04:45 INFO fs.FileSystem: Couldn't get a delegation token from nn1ipaddress:8020
When running from source cluster:
14/02/26 00:05:43 ERROR security.UserGroupInformation: PriviledgedActionException as:testuser@realm1 cause:java.io.IOException: Couldn't setup connection for testuser@realm1 to nn/realm2 With failures, global counters are inaccurate; consider running with -i Copy failed: java.io.IOException: Call to nn1ipaddress failed on local exception: java.io.IOException: Couldn't setup connection for testuser@realm1 to nn/realm2 Caused by: java.io.IOException: Couldn't setup connection for testuser@realm1 to nn/realm2 at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:560) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:513) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:616) at org.apache.hadoop.ipc.Client$Connection.access$2100(Client.java:203) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1254) at org.apache.hadoop.ipc.Client.call(Client.java:1098) ... 26 more
It also shows me host address is not present in kerberos database (I don't have the exact log for that)
So, do I need to configure kerberos in a different way in order to use discp between them? Or am i missing something here?
Any information will be highly appreciated. Thanks in advance.
解决方案Cross-realm authentication is required to use distcp between two secured cluster. It was not configured in those two clusters. After setting up cross-realm authentication correctly, it worked.
这篇关于Hadoop distcp在两个安全(kerberos)群集之间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!