Hadoop distcp在两个安全(kerberos)群集之间 [英] Hadoop distcp between two secured(kerberos) clusters

查看:5549
本文介绍了Hadoop distcp在两个安全(kerberos)群集之间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个Hadoop集群,两个都运行相同的Hadoop版本。我还在两个集群中都有一个用户testuser(示例)(因此testuser keytabs同时存在)。

  Namenode# 1(源群集):hdfs:// nn1:8020 
Namenode#2(目标群集):hdfs:// nn2:8020

我想使用hadoop distcp将一些文件从一个集群复制到另一个集群。示例:在源集群中,我有一个文件路径为/ user / testuser / temp / file-r-0000,在目标集群中,目标目录为/ user / testuser / dest /所以我想要的是将文件r-0000从源集群复制到目标集群的dest目录。



我已经尝试过这些:

  hadoop distcp hdfs:// nn1:8020 / user / testuser / temp / file-r-0000 hdfs:// nn2:8020 / user / testuser / dest 

hadoop distcp hftp:// nn1:8020 / user / testuser / temp / file-r-0000 hdfs:// nn2:8020 / user / testuser / dest



我相信我不需要使用hftp://,因为我有相同版本的hadoop。



当从hftp的目标集群运行时:

  14/02/26 00:04:45 ERROR security.UserGroupInformation:PriviledgedActionException as:testuser @ realm cause:java.net.SocketException:Unexpected文件结束从服务器
14/02/26 00:04:45 ERROR security.UserGroupInformation:PriviledgedActionException as:testuser @ realm原因:java.net.SocketException:从服务器意外的文件结束
14 / 02/26 00:04:45 INFO fs.FileSystem:无法从nn1ipaddress获取委派令牌:8020

当从源群集运行时:

  14/02/26 00:05:43 ERROR security.UserGroupInformation:PriviledgedActionException as :testuser @ realm1原因:java.io.IOException:无法为testuser @ realm1设置连接到nn / realm2 
出现故障时,全局计数器不准确;考虑使用-i
运行复制失败:java.io.IOException:对本地异常调用nn1ipaddress失败:java.io.IOException:无法为testuser @ realm1至nn / realm2设置连接


导致:java.io.IOException:无法为testuser @ realm1设置连接到nn / realm2
at org.apache.hadoop.ipc.Client $ Connection $ 1.run(Client .java:560)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache .hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.ipc.Client $ Connection.handleSaslConnectionFailure(Client.java:513)
at org.apache.hadoop .ipc.Client $ Connection.setupIOstreams(Client.java:616)
在org.apache.hadoop.ipc.Client $ Connection.access $ 2100(Client.java:203)
在org.apache。 hadoop.ipc.Client.getConnection(Client.java:1254)
at org.apache.hadoop.ipc.Client.call(Client.java:1098)
... 26 more

它还显示在kerberos数据库中没有主机地址(我没有确切的日志) / p>

那么,我需要以不同的方式配置kerberos才能在它们之间使用discp?或者我在这里缺少的东西?



任何信息将受到高度赞赏。提前感谢。

解决方案

需要跨域验证才能在两个安全集群之间使用distcp。它没有在这两个集群中配置。在正确设置跨域身份验证后,它工作。


I have two Hadoop clusters and both are running the same Hadoop version. I also have a user "testuser" (example) in both clusters (so testuser keytabs is present in both).

Namenode#1 (source cluster): hdfs://nn1:8020
Namenode#2 (dest cluster): hdfs://nn2:8020

I want to copy some files from one cluster to another using hadoop distcp. Example: in source cluster I have a file with path "/user/testuser/temp/file-r-0000" and in destination cluster, the destination directory is "/user/testuser/dest/". So what I want is to copy file-r-0000 from source cluster to target cluster's "dest" directory.

I have tried these so far:

hadoop distcp hdfs://nn1:8020/user/testuser/temp/file-r-0000 hdfs://nn2:8020/user/testuser/dest

hadoop distcp hftp://nn1:8020/user/testuser/temp/file-r-0000 hdfs://nn2:8020/user/testuser/dest

I believe I do not need to use "hftp://" since I have same version of hadoop. Again, I also tried those in both cluster, but all I'm getting are some exceptions related to security.

When running from destination cluster with hftp:

14/02/26 00:04:45 ERROR security.UserGroupInformation: PriviledgedActionException as:testuser@realm cause:java.net.SocketException: Unexpected end of file from server
14/02/26 00:04:45 ERROR security.UserGroupInformation: PriviledgedActionException as:testuser@realm cause:java.net.SocketException: Unexpected end of file from server
14/02/26 00:04:45 INFO fs.FileSystem: Couldn't get a delegation token from nn1ipaddress:8020

When running from source cluster:

14/02/26 00:05:43 ERROR security.UserGroupInformation: PriviledgedActionException as:testuser@realm1 cause:java.io.IOException: Couldn't setup connection for testuser@realm1 to nn/realm2
With failures, global counters are inaccurate; consider running with -i
Copy failed: java.io.IOException: Call to nn1ipaddress failed on local exception: java.io.IOException: Couldn't setup connection for testuser@realm1 to nn/realm2


Caused by: java.io.IOException: Couldn't setup connection for testuser@realm1 to nn/realm2
    at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:560)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
    at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:513)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:616)
    at org.apache.hadoop.ipc.Client$Connection.access$2100(Client.java:203)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1254)
    at org.apache.hadoop.ipc.Client.call(Client.java:1098)
    ... 26 more

It also shows me host address is not present in kerberos database (I don't have the exact log for that)

So, do I need to configure kerberos in a different way in order to use discp between them? Or am i missing something here?

Any information will be highly appreciated. Thanks in advance.

解决方案

Cross-realm authentication is required to use distcp between two secured cluster. It was not configured in those two clusters. After setting up cross-realm authentication correctly, it worked.

这篇关于Hadoop distcp在两个安全(kerberos)群集之间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆