使用Kerberos从其他作业的Java操作中提交Oozie作业 [英] Submit Oozie Job from another job's java action with Kerberos
问题描述
我尝试使用 Java客户端提交Oozie作业来自另一个Job的java动作的API 。该群集正在使用Kerberos。
以下是我的代码:
//获取OozieClient for本地Oozie
字符串oozieUrl =http://hadooputl02.northamerica.xyz.net:11000/oozie/;
AuthOozieClient wc = new AuthOozieClient(oozieUrl);
wc.setDebugMode(1);
//创建工作流作业配置并设置工作流应用程序路径
属性conf = wc.createConfiguration();
conf.setProperty(OozieClient.APP_PATH,wfAppPath);
conf.setProperty(jobTracker,yarnRM);
conf.setProperty(nameNode,hdfs:// ingestiondev);
//提交并启动工作流作业
String jobId = wc.run(conf);
System.out.println(提交的工作流作业);
但是我收到以下错误:
org.apache.oozie.action.hadoop.JavaMainException:IO_ERROR:
java.io.IOException:连接Oozie服务器时发生错误。重试次数= 1。例外=无法验证,GSSException:没有提供有效的凭证(机制级别:未能找到任何Kerberos tgt)
...
导致:AUTHENTICATION:无法验证,GSSException :没有提供有效的凭证(机制级别:无法找到任何Kerberos tgt)
...
引起:org.apache.hadoop.security.authentication.client.AuthenticationException:GSSException:没有提供有效凭证(机制级别:无法找到任何Kerberos tgt)
...
引起:GSSException:未提供有效凭据(机制级别:无法找到任何Kerberos tgt)
我相信在代码中有更多需要通过kerberos为节点/用户提供对oozie服务器的访问权。
有人可以指向在Kerberized群集上使用Oozie Java API的正确方式吗?
谢谢!
错误消息是明确的:无法找到任何Kerberos tgt
。您的作业运行在YARN容器中,随机节点上,并且没有可用的Kerberos票据。
您是否想知道Oozie如何使用您的Kerberos凭据开始工作,即使它不知道你的密码?这是因为它使用Hadoop内建的后门。但是,您的工作没有适当的Kerberos凭据,因此您在尝试执行某些操作时看到的消息未被覆盖。
How Oozie管理没有凭证的认证
- 连接到Edge节点,用<$ c $创建Kerberos票证c> kinit ,运行一个Oozie命令行来提交一个协调器(它将在特定的日期和时间触发一个工作流程)
- Oozie CLI根据Oozie进行身份验证服务器与本地Kerberos票证,因此协调员(和工作流程)属于你当协调员触发工作流程时,
- ,并且工作流程启动一个操作,并且该操作启动一个YARN工作......这是Oozie服务器对YARN ResourceManager进行身份验证(通常为
oozie
) - 您的Kerberos票证很可能已过期 因为oozie
被定义为特权帐户代理帐户,所以
- ng>在YARN配置中,然后RM接受在您的帐户下启动作业,即使您没有通过Kerberos正确认证
- 它怎么可能? ?因为内部YARN和HDFS使用委托令牌 - 通常,您使用Kerberos验证一次,然后获得令牌,并且适用于所有节点上的所有核心服务;与Oozie在混合,你甚至不必认证......
但有一个问题:代表令牌不适用于任何使用纯Kerberos身份验证的服务 - 即Hive Metastore,Hive JDBC,HBase,ZooKeeper,Oozie等。
这就是为什么Oozie有一个解决方法: explicit <凭证>
请求,用于Hive操作,Hive2操作,HBase操作等。 [声明:我真的不知道它是如何工作的]
我怀疑这些凭据中的任何一个都可以对Oozie本身起作用......!
您可以如何管理自己的自定义身份验证
$ b
- 生成<$ c $ (参见Linux命令
ktutil
)
$> c> keytab
< file>
来下载运行Java动作的容器中的文件 - 它将在当前工作目录,因此无需关心实际路径< file>
等。 b 您可以在我的这篇文章中找到更多详细信息:在kerber下使用JDBC连接到impala时出错os authrication
免责声明:我不知道Oozie预计哪个JAAS主题(例如,ZooKeeper期望 Client
,Hive希望 com.sun.security.jgss.krb5.initiate
)
c>添加到容器CWD中的临时文件(当作业停止时将自动销毁)
kinit -kt myname.keytab myname @ REALM
,它将获得由 KRB5CCNAME
I am trying to submit an Oozie job using Java Client API from another Job's java action. The cluster is using Kerberos.
Here is my code:
// get a OozieClient for local Oozie
String oozieUrl = "http://hadooputl02.northamerica.xyz.net:11000/oozie/";
AuthOozieClient wc = new AuthOozieClient(oozieUrl);
wc.setDebugMode(1);
// create a workflow job configuration and set the workflow application path
Properties conf = wc.createConfiguration();
conf.setProperty(OozieClient.APP_PATH, wfAppPath);
conf.setProperty("jobTracker", "yarnRM");
conf.setProperty("nameNode", "hdfs://ingestiondev");
// submit and start the workflow job
String jobId = wc.run(conf);
System.out.println("Workflow job submitted");
But I am getting the following error:
org.apache.oozie.action.hadoop.JavaMainException: IO_ERROR :
java.io.IOException: Error while connecting Oozie server. No of retries = 1. Exception = Could not authenticate, GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
...
Caused by: AUTHENTICATION : Could not authenticate, GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
...
Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
...
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
I believe there is something more required in the code to give the node/user access to the oozie server through kerberos.
Can someone point me to the correct way to use Oozie Java API on a Kerberized cluster?
thanks!
The error message is explicit: Failed to find any Kerberos tgt
. Your job runs in a YARN container, on a random node, and has no Kerberos ticket available there.
Did you ever wonder how Oozie could start a job with your Kerberos credentials, even though it does not know your password? That's because it uses a backdoor built inside Hadoop. But then your job has no proper Kerberos credentials, hence the message you see when you try to do something not covered.
How Oozie manages authentication without credentials
- you connect to an Edge Node, create a Kerberos ticket with
kinit
, run an Oozie command line to submit a Coordinator (which will fire a Workflow at specific dates and times) - the Oozie CLI authenticates against the Oozie server with the local Kerberos ticket, so the Coordinator (and Workflow) "belong to you"
- when the Coordinator triggers the Workflow, and the Workflow starts an Action, and the Action starts a YARN job... it's the Oozie server that authenticates against YARN ResourceManager (typically as
oozie
) -- your Kerberos ticket has probably expired long ago - but since
oozie
is defined as a priviledged proxy account in YARN config, then the RM accepts to start the job under your account, even though you did not properly authenticate via Kerberos - how is it possible?? because internally YARN and HDFS use a delegation token -- usually, you authenticate once with Kerberos, then you get a token, and you are good for all core services on all nodes; with Oozie in the mix, you don't even have to authenticate...
But there's a catch: the delegation token does not work for any service that uses pure Kerberos authentication -- i.e. Hive Metastore, Hive JDBC, HBase, ZooKeeper, Oozie, etc.
That's why Oozie has a workaround: explicit <credential>
requests for Hive actions, Hive2 actions, HBase actions, etc. [disclaimer: I don't really know how it actually works]
I doubt that any of these "credentials" would work against Oozie itself...!
How you can manage your own custom authentication
- build a
keytab
file with your password inside (cf. Linux commandktutil
) - upload that file to HDFS with restricted access -- because anyone who can get access to that file could then login as you!!!
- tell Oozie to download the file in the container that runs your Java action, with
<file>
-- it will be available in the Current Working Dir so you won't have to care about the actual path - create a JAAS config file that explains to Java that "whenever the Oozie REST server requests authentication via SPNEGO, create a Kerberos ticket on-the-fly using this principal, whose password is in that keytab file" (instead of the default which is "look for the ticket cache and get an existing ticket there")
- upload that JAAS config file to HDFS, use another
<file>
etc. - activate that JAAS config with a Java system property
You will find more details in that post of mine: Error when connect to impala with JDBC under kerberos authrication
Disclaimer: I don't know which JAAS "subject" is expected by Oozie (for instance, ZooKeeper expects Client
, Hive expects com.sun.security.jgss.krb5.initiate
)
Alternative: forget about JAAS and use the cache.
- set env variable
KRB5CCNAME
to a temp file in the CWD of the container (which will be destroyed automatically when the job stops) - spawn a Linux command
kinit -kt myname.keytab myname@REALM
which will obtain a Kerberos ticket in the cache defined byKRB5CCNAME
- and let JAAS follow the default process
这篇关于使用Kerberos从其他作业的Java操作中提交Oozie作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!