如何在 flink 独立安装上进行 kerberos 身份验证? [英] How to do kerberos authentication on a flink standalone installation?

查看:56
本文介绍了如何在 flink 独立安装上进行 kerberos 身份验证?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个独立的 Flink 安装,我想在其上运行一个将数据写入 HDFS 安装的流作业.HDFS 安装是 Cloudera 部署的一部分,需要 Kerberos 身份验证才能读取和写入 HDFS.由于我没有找到有关如何使 Flink 与受 Kerberos 保护的 HDFS 连接的文档,因此我不得不对该过程进行一些有根据的猜测.这是我到目前为止所做的:

I have a standalone Flink installation on top of which I want to run a streaming job that is writing data into a HDFS installation. The HDFS installation is part of a Cloudera deployment and requires Kerberos authentication in order to read and write the HDFS. Since I found no documentation on how to make Flink connect with a Kerberos-protected HDFS I had to make some educated guesses about the procedure. Here is what I did so far:

  • 我为我的用户创建了一个密钥表文件.
  • 在我的 Flink 工作中,我添加了以下代码:

  • I created a keytab file for my user.
  • In my Flink job, I added the following code:

UserGroupInformation.loginUserFromKeytab("myusername", "/path/to/keytab");

  • 最后,我使用 TextOutputFormat 将数据写入 HDFS.

  • Finally I am using a TextOutputFormatto write data to the HDFS.

    运行作业时,出现以下错误:

    When I run the job, I'm getting the following error:

    org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled.  Available:[TOKEN, KERBE
    ROS]
            at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
            at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
            at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
            at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
            at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
            at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
            at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1730)
            at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1668)
            at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1593)
            at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:397)
            at org.apache.hadoop.hdfs.DistributedFileSystem$6.doCall(DistributedFileSystem.java:393)
            at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
            at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:393)
            at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:337)
            at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
            at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889)
            at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786)
            at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.create(HadoopFileSystem.java:405)
    

    出于某种奇怪的原因,即使我调用了 loginUserFromKeytab,Flink 似乎也尝试了 SIMPLE 身份验证.我在 Stackoverflow 上发现了另一个类似的问题(在 YARN 集群 (Cloudera) 上执行 Flink 示例代码时 Kerberos 身份验证出错),其中有一个解释:

    For some odd reason, Flink seems to try SIMPLE authentication, even though I called loginUserFromKeytab. I found another similar issue on Stackoverflow (Error with Kerberos authentication when executing Flink example code on YARN cluster (Cloudera)) which had an answer explaining that:

    如果用户在所有工作节点上都经过身份验证,独立 Flink 目前仅支持访问 Kerberos 保护的 HDFS.

    Standalone Flink currently only supports accessing Kerberos secured HDFS if the user is authenticated on all worker nodes.

    这可能意味着我必须在操作系统级别进行一些身份验证,例如使用 kinit.由于我对 Kerberos 的了解非常有限,我不知道我会怎么做.此外,我想了解在 kinit 之后运行的程序如何在没有任何相关配置的情况下知道从本地缓存中选择哪个 Kerberos 票证.

    That may mean that I have to do some authentication at the OS level e.g. with kinit. Since my knowledge of Kerberos is very limited I have no idea how I would do it. Also I would like to understand how the program running after kinit actually knows which Kerberos ticket to pick from the local cache when there is no configuration whatsoever regarding this.

    推荐答案

    我不是 Flink 用户,但基于我在 Spark & 中看到的内容朋友们,我的猜测是在所有工作节点上进行身份验证"意味着每个工作进程都有

    I'm not a Flink user, but based on what I've seen with Spark & friends, my guess is that "Authenticated on all worker nodes" means that each worker process has

    1. 一个 core-site.xml 配置在本地 fs 上可用hadoop.security.authentication 设置为 kerberos(以及其他东西)

    1. a core-site.xml config available on local fs with hadoop.security.authentication set to kerberos (among other things)

    包含 core-site.xml 的本地目录已添加到 CLASSPATH,以便 Hadoop Configuration 对象自动找到它[它将否则,静默恢复为默认的硬编码值,duh]

    the local dir containing core-site.xml added to the CLASSPATH so that it is found automatically by the Hadoop Configuration object [it will revert silently to default hard-coded values otherwise, duh]

    那个UGI登录"方法非常冗长,所以如果它确实在Flink尝试从Configuration启动HDFS客户端之前被调用,你会注意到.另一方面,如果你没有看到冗长的东西,那么你创建私有 Kerberos TGT 的尝试被 Flink 绕过了,你​​必须想办法绕过 Flink :-/

    That UGI "login" method is incredibly verbose, so if it was indeed called before Flink tries to initiate the HDFS client from the Configuration, you will notice. On the other hand, if you don't see the verbose stuff, then your attempt to create a private Kerberos TGT is bypassed by Flink, and you have to find a way to bypass Flink :-/

    这篇关于如何在 flink 独立安装上进行 kerberos 身份验证?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆