我应该在hadoop的每个操作之前调用ugi.checkTGTAndReloginFromKeytab()吗? [英] Should I call ugi.checkTGTAndReloginFromKeytab() before every action on hadoop?

查看:2206
本文介绍了我应该在hadoop的每个操作之前调用ugi.checkTGTAndReloginFromKeytab()吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的服务器应用程序中,我从我的Java应用程序连接到Kerberos安全的Hadoop集群。我使用的是HDFS文件系统,Oozie,Hive等各种组件。在应用程序启动时,我打电话给

  UserGroupInformation。 loginUserFromKeytabAndReturnUGI(...); 

这会返回我 UserGroupInformation 实例,它适用于应用程序生命在执行特权操作时,我使用 ugi.doAs(action)启动它们。



这工作正常,但我想知道是否应该何时更新 UserGroupInformation 中的kerberos票证?我发现一个方法用户组信息。我还发现,这个方法正在被各种Hadoop工具调用,例如 WebHdfsFileSystem



现在如果我想我的服务器应用程序(可能运行几个月甚至几年)永远不会遇到票证过期什么是最好的方法?提供具体问题:


  1. 我可以依靠他们称之为的各种Hadoop客户端checkTGTAndReloginFromKeytab 什么时候需要它?

  2. 我应该在自己的代码中调用自己的 checkTGTAndReloginFromKeytab 我自己吗?

  3. 在每次调用 ugi.doAs(...)之前调用定时器并定期调用它(多久)?


解决方案

Hadoop提交者在这里!这是一个很好的问题。



不幸的是,如果不深入研究应用程序的特定使用模式,很难给出明确的答案。相反,我可以提供一般准则,并描述Hadoop何时会自动处理票证续订或重新登录keytab,以及何时不会。



主要Hadoop生态系统中用于Kerberos身份验证的用例是Hadoop的RPC框架,该框架使用SASL进行身份验证。 Hadoop生态系统中的大多数守护进程通过在进程启动时对 UserGroupInformation#loginUserFromKeytab 进行一次性一次性调用来处理此问题。这方面的例子包括HDFS数据节点,它必须认证其对NameNode的RPC调用,以及YARN NodeManager,它必须认证其对ResourceManager的调用。那么像DataNode这样的守护进程如何在进程启动时进行一次性登录,然后继续运行好几个月,长久超过典型的票证到期时间?



由于这是这样一个常见的用例,Hadoop直接在RPC客户端层内部实现了一个自动重新登录机制。此代码在RPC Client#handleSaslConnectionFailure 方法:

  //尝试重新登录
if(UserGroupInformation.isLoginKeytabBased()){
UserGroupInformation.getLoginUser()。reloginFromKeytab() ;
} else if(UserGroupInformation.isLoginTicketBased()){
UserGroupInformation.getLoginUser()。reloginFromTicketCache();
}

您可以将此视为重新登录的懒惰评估。它只是为了响应尝试的RPC连接上的验证失败而重新执行登录。



知道这一点,我们可以给出部分答案。如果您的应用程序的使用模式是从密钥表登录,然后执行典型的Hadoop RPC调用,那么您可能不需要推出自己的重新登录代码。 RPC客户端层将为您完成。 典型的Hadoop RPC意味着绝大多数用于与Hadoop交互的Java API,包括HDFS FileSystem API, YarnClient 和MapReduce 作业 提交。



然而,一些应用程序使用模式根本不涉及Hadoop RPC。一个例子就是那些只与Hadoop的REST API交互的应用程序,比如 WebHDFS YARN REST API 。在这种情况下,身份验证模型将通过SPNEGO使用Kerberos,如Hadoop中所述 HTTP Authentication 文档。



知道了这一点,我们可以在答案中添加更多内容。如果您的应用程序的使用模式根本不使用Hadoop RPC,而是完全依赖于REST API,那么您必须推出自己的重新登录逻辑。这正是为什么 WebHdfsFileSystem 调用 UserGroupInformation#checkTGTAndReloginFromkeytab WebHdfsFileSystem 选择在每个操作之前进行调用。这是一个很好的策略,因为 UserGroupInformation#checkTGTAndReloginFromkeytab 如果票据过期作为最终用例,让我们考虑一个交互式进程,而不是从密钥表登录,而是需要用户在启动应用程序之前在外部运行 kinit 。在绝大多数情况下,这些将是短期运行的应用程序,例如Hadoop CLI命令。但是,在某些情况下,这些可能是运行时间较长的流程。为了支持运行时间更长的进程,Hadoop启动后台线程以将Kerberos票证关闭更新为过期。这个逻辑可见于 UserGroupInformation#spawnAutoRenewalThreadForUserCreds 。与RPC层提供的自动重登录逻辑相比,这里有一个重要区别。在这种情况下,Hadoop只能够更新票证并延长其使用期限。按照Kerberos基础设施的规定,票据具有最长的可更新时间。之后,票证将不再可用。在这种情况下重新登录几乎是不可能的,因为这意味着重新提示用户输入密码,他们可能会离开终端。这意味着,如果流程继续运行,而不会超过票证,则无法再进行身份验证。



同样,我们可以使用此信息来通知我们的整体回答。如果您在启动应用程序之前依靠用户通过 kinit 交互式登录,并且您确信应用程序的运行时间不会超过Kerberos票证的最长可用生命周期,那么您可以依靠Hadoop内部构件来定期更新。



如果您使用的是基于keytab的登录,并且您不确定应用程序的使用情况模式可以依赖于Hadoop RPC层的自动重新登录,然后保守的方法是自己推出。 @SamsonScharfrichter在这里给出了一个很好的答案,可以滚动你自己的。

HBase Kerberos连接续订策略



最后,我应该添加关于API稳定性的说明。 Apache Hadoop兼容性 指南详细讨论了Hadoop开发社区对向后兼容的承诺。 UserGroupInformation 的界面被注释为 LimitedPrivate 演变。从技术上讲,这意味着 UserGroupInformation 的API不被认为是公开的,并且它可能以向后不兼容的方式发展。实际上,已经有很多代码取决于 UserGroupInformation 的接口,所以我们做出突破性改变根本是不可行的。当然,在当前的2.x发行版中,我不会担心方法签名会从您的下方更换出来并破坏您的代码。



现在我们拥有这个背景信息让我们重新回顾一下你的具体问题。


我可以依靠他们在需要时调用checkTGTAndReloginFromKeytab的各种Hadoop客户端吗?


如果应用程序的使用模式是调用Hadoop客户端,而Hadoop客户端又使用Hadoop的RPC框架,那么可以依靠这一点。如果您的应用程序的使用模式仅调用Hadoop REST API,则不能依赖此。


我应该在自己的代码中自己调用checkTGTAndReloginFromKeytab吗?

如果应用程序的使用模式仅用于调用Hadoop REST API而不是Hadoop RPC调用,那么您可能需要执行此操作。你不会得到在Hadoop的RPC客户端中实现的自动重新登录的好处。


如果在每次调用之前这样做到ugi.doAs(...)或者更确切地说设置一个计时器并定期调用它(频率)?

拨打电话很好 UserGroupInformation#checkTGTAndReloginFromKeytab 就在需要验证的每个操作之前。如果票证不是即将到期,那么该方法将是无效的。如果您怀疑Kerberos基础架构不稳定,并且您不希望客户端操作支付重新登录的延迟成本,那么这将成为在单独的后台线程中执行此操作的原因。只要确保在票证的实际到期时间之前保持一点点。您可以借用 UserGroupInformation 中的逻辑来确定票证是否过期到期。在实践中,我从来没有亲自看到重新登录的延迟是有问题的。

In my server application I'm connecting to Kerberos secured Hadoop cluster from my java application. I'm using various components like the HDFS file system, Oozie, Hive etc. On the application startup I do call

UserGroupInformation.loginUserFromKeytabAndReturnUGI( ... );

This returns me UserGroupInformation instance and I keep it for application lifetime. When doing privileged action I launch them with ugi.doAs(action).

This works fine but I wonder if and when should I renew the kerberos ticket in UserGroupInformation? I found a method UserGroupInformation.checkTGTAndReloginFromKeytab() which seems to do the ticket renewal whenever it's close to expiry. I also found that this method is being called by various Hadoop tools like WebHdfsFileSystem for example.

Now if I want my server application (possibly running for months or even years) to never experience ticket expiry what is the best approach? To provide concrete questions:

  1. Can I rely on the various Hadoop clients they call checkTGTAndReloginFromKeytab whenever it's needed?
  2. Should I call ever checkTGTAndReloginFromKeytab myself in my code?
  3. If so should I do that before every single call to ugi.doAs(...) or rather setup a timer and call it periodically (how often)?

解决方案

Hadoop committer here! This is an excellent question.

Unfortunately, it's difficult to give a definitive answer to this without a deep dive into the particular usage patterns of the application. Instead, I can offer general guidelines and describe when Hadoop would handle ticket renewal or re-login from a keytab automatically for you, and when it wouldn't.

The primary use case for Kerberos authentication in the Hadoop ecosystem is Hadoop's RPC framework, which uses SASL for authentication. Most of the daemon processes in the Hadoop ecosystem handle this by doing a single one-time call to UserGroupInformation#loginUserFromKeytab at process startup. Examples of this include the HDFS DataNode, which must authenticate its RPC calls to the NameNode, and the YARN NodeManager, which must authenticate its calls to the ResourceManager. How is it that daemons like the DataNode can do a one-time login at process startup and then keep on running for months, long past typical ticket expiration times?

Since this is such a common use case, Hadoop implements an automatic re-login mechanism directly inside the RPC client layer. The code for this is visible in the RPC Client#handleSaslConnectionFailure method:

          // try re-login
          if (UserGroupInformation.isLoginKeytabBased()) {
            UserGroupInformation.getLoginUser().reloginFromKeytab();
          } else if (UserGroupInformation.isLoginTicketBased()) {
            UserGroupInformation.getLoginUser().reloginFromTicketCache();
          }

You can think of this as "lazy evaluation" of re-login. It only re-executes login in response to an authentication failure on an attempted RPC connection.

Knowing this, we can give a partial answer. If your application's usage pattern is to login from a keytab and then perform typical Hadoop RPC calls, then you likely do not need to roll your own re-login code. The RPC client layer will do it for you. "Typical Hadoop RPC" means the vast majority of Java APIs for interacting with Hadoop, including the HDFS FileSystem API, the YarnClient and MapReduce Job submissions.

However, some application usage patterns do not involve Hadoop RPC at all. An example of this would be applications that interact solely with Hadoop's REST APIs, such as WebHDFS or the YARN REST APIs. In that case, the authentication model uses Kerberos via SPNEGO as described in the Hadoop HTTP Authentication documentation.

Knowing this, we can add more to our answer. If your application's usage pattern does not utilize Hadoop RPC at all, and instead sticks solely to the REST APIs, then you must roll your own re-login logic. This is exactly why WebHdfsFileSystem calls UserGroupInformation#checkTGTAndReloginFromkeytab, just like you noticed. WebHdfsFileSystem chooses to make the call right before every operation. This is a fine strategy, because UserGroupInformation#checkTGTAndReloginFromkeytab only renews the ticket if it's "close" to expiration. Otherwise, the call is a no-op.

As a final use case, let's consider an interactive process, not logging in from a keytab, but rather requiring the user to run kinit externally before launching the application. In the vast majority of cases, these are going to be short-running applications, such as Hadoop CLI commands. However, in some cases these can be longer-running processes. To support longer-running processes, Hadoop starts a background thread to renew the Kerberos ticket "close" to expiration. This logic is visible in UserGroupInformation#spawnAutoRenewalThreadForUserCreds. There is an important distinction here though compared to the automatic re-login logic provided in the RPC layer. In this case, Hadoop only has the capability to renew the ticket and extend its lifetime. Tickets have a maximum renewable lifetime, as dictated by the Kerberos infrastructure. After that, the ticket won't be usable anymore. Re-login in this case is practically impossible, because it would imply re-prompting the user for a password, and they likely walked away from the terminal. This means that if the process keeps running beyond expiration of the ticket, it won't be able to authenticate anymore.

Again, we can use this information to inform our overall answer. If you rely on a user to login interactively via kinit before launching the application, and if you're confident the application won't run longer than the Kerberos ticket's maximum renewable lifetime, then you can rely on Hadoop internals to cover periodic renewal for you.

If you're using keytab-based login, and you're just not sure if your application's usage pattern can rely on the Hadoop RPC layer's automatic re-login, then the conservative approach is to roll your own. @SamsonScharfrichter gave an excellent answer here about rolling your own.

HBase Kerberos connection renewal strategy

Finally, I should add a note about API stability. The Apache Hadoop Compatibility guidelines discuss the Hadoop development community's commitment to backwards-compatibility in full detail. The interface of UserGroupInformation is annotated LimitedPrivate and Evolving. Technically, this means the API of UserGroupInformation is not considered public, and it could evolve in backwards-incompatible ways. As a practical matter, there is a lot of code already depending on the interface of UserGroupInformation, so it's simply not feasible for us to make a breaking change. Certainly within the current 2.x release line, I would not have any fear about method signatures changing out from under you and breaking your code.

Now that we have all of this background information, let's revisit your concrete questions.

Can I rely on the various Hadoop clients they call checkTGTAndReloginFromKeytab whenever it's needed?

You can rely on this if your application's usage pattern is to call the Hadoop clients, which in turn utilize Hadoop's RPC framework. You cannot rely on this if your application's usage pattern only calls the Hadoop REST APIs.

Should I call ever checkTGTAndReloginFromKeytab myself in my code?

You'll likely need to do this if your application's usage pattern is solely to call the Hadoop REST APIs instead of Hadoop RPC calls. You would not get the benefit of the automatic re-login implemented inside Hadoop's RPC client.

If so should I do that before every single call to ugi.doAs(...) or rather setup a timer and call it periodically (how often)?

It's fine to call UserGroupInformation#checkTGTAndReloginFromKeytab right before every action that needs to be authenticated. If the ticket is not close to expiration, then the method will be a no-op. If you're suspicious that your Kerberos infrastructure is sluggish, and you don't want client operations to pay the latency cost of re-login, then that would be a reason to do it in a separate background thread. Just be sure to stay a little bit ahead of the ticket's actual expiration time. You might borrow the logic inside UserGroupInformation for determining if a ticket is "close" to expiration. In practice, I've never personally seen the latency of re-login be problematic.

这篇关于我应该在hadoop的每个操作之前调用ugi.checkTGTAndReloginFromKeytab()吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆