在 YARN 集群 (Cloudera) 上执行 Flink 示例代码时 Kerberos 身份验证出错 [英] Error with Kerberos authentication when executing Flink example code on YARN cluster (Cloudera)

查看:54
本文介绍了在 YARN 集群 (Cloudera) 上执行 Flink 示例代码时 Kerberos 身份验证出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试在 YARN 集群上使用 Flink 来运行示例代码 (flinkexamplesWordCount.jar),但出现以下安全验证错误.

I was trying Flink on YARN cluster to run the example code (flinkexamplesWordCount.jar) but am getting the below security authentication error.

org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Cannot initialize task 'DataSink (CsvOutputFormat (path: hdfs://10.94.146.126:8020/user/qawsbtch/flink_out, delimiter:  ))': SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]

我不确定问题出在哪里以及我缺少什么.我可以在同一个 cloudera hadoop 集群中运行 spark 或 map-reduce 作业而不会出现任何问题.

I am not sure where the issue is and what is that I am missing to do. I could run spark or map-reduce jobs without any issue in the same cloudera hadoop cluster.

我确实在 flink-conf.yaml 中更新了 hdfs-site.xml 和 core-site.xml 的 CONF 文件路径(在主节点和工作节点中更新相同),并导出了 HADOOP_CONF_DIR 路径.我还尝试在执行 flink run 命令时在 HDFS 文件路径中提供 host:port.

I did update the CONF file paths for hdfs-site.xml and core-site.xml in the flink-conf.yaml (updated same in Master and Worker nodes) and also export the HADOOP_CONF_DIR path. Also I tried give the host:port in the HDFS file path when executing flink run command.

错误信息

    22:14:25,138 ERROR   org.apache.flink.client.CliFrontend                           - Error while running the command.
org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Cannot initialize task 'DataSink (CsvOutputFormat (path: hdfs://10.94.146.126:8020/user/qawsbtch/flink_out, delimiter:  ))': SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]
        at org.apache.flink.client.program.Client.run(Client.java:413)
        at org.apache.flink.client.program.Client.run(Client.java:356)
        at org.apache.flink.client.program.Client.run(Client.java:349)
        at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:63)
        at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:78)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
        at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
        at org.apache.flink.client.program.Client.run(Client.java:315)
        at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:584)
        at org.apache.flink.client.CliFrontend.run(CliFrontend.java:290)
        at org.apache.flink.client.CliFrontend$2.run(CliFrontend.java:873)
        at org.apache.flink.client.CliFrontend$2.run(CliFrontend.java:870)
        at org.apache.flink.runtime.security.SecurityUtils$1.run(SecurityUtils.java:50)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
        at org.apache.flink.runtime.security.SecurityUtils.runSecured(SecurityUtils.java:47)
        at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:870)
        at org.apache.flink.client.CliFrontend.main(CliFrontend.java:922)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Cannot initialize task 'DataSink (CsvOutputFormat (path: hdfs://10.94.146.126:8020/user/qawsbtch/flink_out, delimiter:  ))': SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]

推荐答案

(我与原始问题的作者进行了私下交谈以找出此解决方案)

(I had a private conversation with the author of the original question to figure out this solution)

原始问题评论中发布的日志文件表明该作业是针对 Flink 的独立安装提交的.如果用户在所有工作节点上都经过身份验证,独立 Flink 目前仅支持访问 Kerberos 保护的 HDFS.使用 YARN 上的 Flink,只有在 YARN 上启动作业的用户需要使用 Kerberos 进行身份验证.

The log files posted in the comments of the original question indicate that the job was submitted against a standalone installation of Flink. Standalone Flink currently only supports accessing Kerberos secured HDFS if the user is authenticated on all worker nodes. With Flink on YARN, only the user starting the job on YARN needs to be authenticated with Kerberos.

另外,评论区还有一个问题:

Also, in the comment section, there was another issue:

robert@cdh544-worker-0:~/hd22/flink-0.9.0$ ./bin/yarn-session.sh -n 2
20:39:50,563 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting to ResourceManager at /0.0.0.0:8032
20:39:50,600 INFO  org.apache.flink.yarn.FlinkYarnClient                         - Using values:
20:39:50,602 INFO  org.apache.flink.yarn.FlinkYarnClient                         -  TaskManager count = 2
20:39:50,602 INFO  org.apache.flink.yarn.FlinkYarnClient                         -  JobManager memory = 1024
20:39:50,602 INFO  org.apache.flink.yarn.FlinkYarnClient                         -  TaskManager memory = 1024
20:39:51,708 INFO  org.apache.hadoop.ipc.Client                                  - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
20:39:52,710 INFO  org.apache.hadoop.ipc.Client                                  - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
20:39:53,712 INFO  org.apache.hadoop.ipc.Client                                  - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
20:39:54,714 INFO  org.apache.hadoop.ipc.Client                                  - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

问题是您在启用了 YARN HA 的 Hadoop/YARN 2.6.0 集群上使用 Flink 0.9.0(包括 Hadoop 2.2.0).Flink 的旧(2.2.0)Hadoop 库无法正确读取 ResourceManager 地址以进行 HA 设置.

The problem is that you are using Flink 0.9.0 (with Hadoop 2.2.0 included) on a cluster with Hadoop/YARN 2.6.0 with YARN HA enabled. Flink's old (2.2.0) Hadoop library is not able to properly read the ResourceManager address for a HA setup.

下载 Flink(使用 Hadoop 2.6.0)就可以了.

Downloading Flink (with Hadoop 2.6.0) will make it work.

这篇关于在 YARN 集群 (Cloudera) 上执行 Flink 示例代码时 Kerberos 身份验证出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆