在YARN群集(Cloudera)上执行Flink示例代码时,Kerberos身份验证出错 [英] Error with Kerberos authentication when executing Flink example code on YARN cluster (Cloudera)
问题描述
我正在尝试在YARN群集上运行Flink以运行示例代码(flink examples WordCount.jar),但遇到了以下安全身份验证错误.
I was trying Flink on YARN cluster to run the example code (flinkexamplesWordCount.jar) but am getting the below security authentication error.
org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Cannot initialize task 'DataSink (CsvOutputFormat (path: hdfs://10.94.146.126:8020/user/qawsbtch/flink_out, delimiter: ))': SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]
我不确定问题出在哪里以及我错过了做什么.我可以在同一个cloudera hadoop集群中运行spark或map-reduce作业,而不会出现任何问题.
I am not sure where the issue is and what is that I am missing to do. I could run spark or map-reduce jobs without any issue in the same cloudera hadoop cluster.
我确实在flink-conf.yaml中更新了hdfs-site.xml和core-site.xml的CONF文件路径(在Master和Worker节点中进行了更新),并且还导出了HADOOP_CONF_DIR路径.我也尝试在执行flink run命令时在HDFS文件路径中提供host:port.
I did update the CONF file paths for hdfs-site.xml and core-site.xml in the flink-conf.yaml (updated same in Master and Worker nodes) and also export the HADOOP_CONF_DIR path. Also I tried give the host:port in the HDFS file path when executing flink run command.
错误消息
22:14:25,138 ERROR org.apache.flink.client.CliFrontend - Error while running the command.
org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Cannot initialize task 'DataSink (CsvOutputFormat (path: hdfs://10.94.146.126:8020/user/qawsbtch/flink_out, delimiter: ))': SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]
at org.apache.flink.client.program.Client.run(Client.java:413)
at org.apache.flink.client.program.Client.run(Client.java:356)
at org.apache.flink.client.program.Client.run(Client.java:349)
at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:63)
at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:78)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
at org.apache.flink.client.program.Client.run(Client.java:315)
at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:584)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:290)
at org.apache.flink.client.CliFrontend$2.run(CliFrontend.java:873)
at org.apache.flink.client.CliFrontend$2.run(CliFrontend.java:870)
at org.apache.flink.runtime.security.SecurityUtils$1.run(SecurityUtils.java:50)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.flink.runtime.security.SecurityUtils.runSecured(SecurityUtils.java:47)
at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:870)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:922)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Cannot initialize task 'DataSink (CsvOutputFormat (path: hdfs://10.94.146.126:8020/user/qawsbtch/flink_out, delimiter: ))': SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]
推荐答案
(我与原始问题的作者进行了私下交谈,以找出解决方案)
(I had a private conversation with the author of the original question to figure out this solution)
原始问题的注释中发布的日志文件表明,该作业是针对Flink的独立安装提交的.如果用户在所有工作节点上均已通过身份验证,则独立Flink当前仅支持访问受Kerberos保护的HDFS. 使用YARN上的Flink,只有在YARN上启动作业的用户才需要使用Kerberos进行身份验证.
The log files posted in the comments of the original question indicate that the job was submitted against a standalone installation of Flink. Standalone Flink currently only supports accessing Kerberos secured HDFS if the user is authenticated on all worker nodes. With Flink on YARN, only the user starting the job on YARN needs to be authenticated with Kerberos.
此外,在评论部分,还有另一个问题:
Also, in the comment section, there was another issue:
robert@cdh544-worker-0:~/hd22/flink-0.9.0$ ./bin/yarn-session.sh -n 2
20:39:50,563 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032
20:39:50,600 INFO org.apache.flink.yarn.FlinkYarnClient - Using values:
20:39:50,602 INFO org.apache.flink.yarn.FlinkYarnClient - TaskManager count = 2
20:39:50,602 INFO org.apache.flink.yarn.FlinkYarnClient - JobManager memory = 1024
20:39:50,602 INFO org.apache.flink.yarn.FlinkYarnClient - TaskManager memory = 1024
20:39:51,708 INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
20:39:52,710 INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
20:39:53,712 INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
20:39:54,714 INFO org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
问题是您在启用了YARN HA的Hadoop/YARN 2.6.0的群集上使用Flink 0.9.0(包括Hadoop 2.2.0). Flink的旧(2.2.0)Hadoop库无法正确读取用于HA设置的ResourceManager地址.
The problem is that you are using Flink 0.9.0 (with Hadoop 2.2.0 included) on a cluster with Hadoop/YARN 2.6.0 with YARN HA enabled. Flink's old (2.2.0) Hadoop library is not able to properly read the ResourceManager address for a HA setup.
下载Flink(使用Hadoop 2.6.0)将使其正常工作.
Downloading Flink (with Hadoop 2.6.0) will make it work.
这篇关于在YARN群集(Cloudera)上执行Flink示例代码时,Kerberos身份验证出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!