如何在flink独立安装上进行kerberos身份验证? [英] How to do kerberos authentication on a flink standalone installation?

查看:1651
本文介绍了如何在flink独立安装上进行kerberos身份验证?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个独立的Flink安装,我想在其上运行一个将数据写入HDFS安装的流作业. HDFS安装是Cloudera部署的一部分,并且需要Kerberos身份验证才能读取和写入HDFS.由于我没有找到有关如何使Flink与受Kerberos保护的HDFS连接的文档,因此我不得不对该过程进行一些有根据的猜测.这是我到目前为止所做的:

I have a standalone Flink installation on top of which I want to run a streaming job that is writing data into a HDFS installation. The HDFS installation is part of a Cloudera deployment and requires Kerberos authentication in order to read and write the HDFS. Since I found no documentation on how to make Flink connect with a Kerberos-protected HDFS I had to make some educated guesses about the procedure. Here is what I did so far:

  • 我为用户创建了一个密钥表文件.
  • 在Flink工作中,我添加了以下代码:

  • I created a keytab file for my user.
  • In my Flink job, I added the following code:

UserGroupInformation.loginUserFromKeytab("myusername", "/path/to/keytab");

  • 最后,我正在使用TextOutputFormat将数据写入HDFS.

  • Finally I am using a TextOutputFormatto write data to the HDFS.

    运行作业时,出现以下错误:

    When I run the job, I'm getting the following error:

    org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled.  Available:[TOKEN, KERBE
    ROS]
            at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
            at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
            at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
            at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
            at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
            at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
            at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1730)
            at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1668)
            at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1593)
            at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397)
            at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393)
            at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
            at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393)
            at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337)
            at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
            at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889)
            at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786)
            at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:405)
    

    出于某种奇怪的原因,即使我调用了loginUserFromKeytab,Flink似乎也尝试了SIMPLE身份验证.我在Stackoverflow上发现了另一个类似的问题(在YARN群集(Cloudera)上执行Flink示例代码时,Kerberos身份验证出错),它的答案解释如下:

    For some odd reason, Flink seems to try SIMPLE authentication, even though I called loginUserFromKeytab. I found another similar issue on Stackoverflow (Error with Kerberos authentication when executing Flink example code on YARN cluster (Cloudera)) which had an answer explaining that:

    如果用户在所有工作节点上均已通过身份验证,则独立Flink当前仅支持访问受Kerberos保护的HDFS.

    Standalone Flink currently only supports accessing Kerberos secured HDFS if the user is authenticated on all worker nodes.

    这可能意味着我必须在操作系统级别进行一些身份验证,例如使用kinit.由于我对Kerberos的了解非常有限,所以我不知道该怎么做.我也想了解在没有任何配置的情况下,在kinit之后运行的程序实际上如何知道从本地缓存中选择哪个Kerberos票证.

    That may mean that I have to do some authentication at the OS level e.g. with kinit. Since my knowledge of Kerberos is very limited I have no idea how I would do it. Also I would like to understand how the program running after kinit actually knows which Kerberos ticket to pick from the local cache when there is no configuration whatsoever regarding this.

    推荐答案

    我不是Flink用户,但基于我在Spark&朋友,我猜是在所有工作程序节点上均已通过身份验证" 表示每个工作程序进程具有

    I'm not a Flink user, but based on what I've seen with Spark & friends, my guess is that "Authenticated on all worker nodes" means that each worker process has

    1. a core-site.xml配置可在本地fs上使用 hadoop.security.authentication设置为kerberos(除其他外) 东西)

    1. a core-site.xml config available on local fs with hadoop.security.authentication set to kerberos (among other things)

    包含core-site.xml的本地目录已添加到CLASSPATH中,以便由Hadoop Configuration对象自动找到它.[否则它将默默恢复为默认的硬编码值,不是吗?

    the local dir containing core-site.xml added to the CLASSPATH so that it is found automatically by the Hadoop Configuration object [it will revert silently to default hard-coded values otherwise, duh]

    该UGI登录"方法非常冗长,因此,如果在Flink尝试从Configuration启动HDFS客户端之前确实调用了该方法,您会注意到.另一方面,如果看不到冗长的内容,则Flink会绕过创建私有Kerberos TGT的尝试,您必须找到一种绕过Flink的方法:-/

    That UGI "login" method is incredibly verbose, so if it was indeed called before Flink tries to initiate the HDFS client from the Configuration, you will notice. On the other hand, if you don't see the verbose stuff, then your attempt to create a private Kerberos TGT is bypassed by Flink, and you have to find a way to bypass Flink :-/

    这篇关于如何在flink独立安装上进行kerberos身份验证?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆