我应该在 hadoop 上的每个操作之前调用 ugi.checkTGTAndReloginFromKeytab() 吗? [英] Should I call ugi.checkTGTAndReloginFromKeytab() before every action on hadoop?

查看:45
本文介绍了我应该在 hadoop 上的每个操作之前调用 ugi.checkTGTAndReloginFromKeytab() 吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的服务器应用程序中,我从我的 Java 应用程序连接到 Kerberos 安全的 Hadoop 集群.我正在使用各种组件,如 HDFS 文件系统、Oozie、Hive 等.在应用程序启动时,我会调用

In my server application I'm connecting to Kerberos secured Hadoop cluster from my java application. I'm using various components like the HDFS file system, Oozie, Hive etc. On the application startup I do call

UserGroupInformation.loginUserFromKeytabAndReturnUGI( ... );

这会返回我的 UserGroupInformation 实例,我将它保留在应用程序生命周期内.在执行特权操作时,我使用 ugi.doAs(action) 启动它们.

This returns me UserGroupInformation instance and I keep it for application lifetime. When doing privileged action I launch them with ugi.doAs(action).

这工作正常,但我想知道我是否以及何时应该更新 UserGroupInformation 中的 kerberos 票证?我找到了一种方法 UserGroupInformation.checkTGTAndReloginFromKeytab() ,它似乎在接近到期时进行票证续订.我还发现此方法正在被各种 Hadoop 工具调用,例如 WebHdfsFileSystem.

This works fine but I wonder if and when should I renew the kerberos ticket in UserGroupInformation? I found a method UserGroupInformation.checkTGTAndReloginFromKeytab() which seems to do the ticket renewal whenever it's close to expiry. I also found that this method is being called by various Hadoop tools like WebHdfsFileSystem for example.

现在,如果我希望我的服务器应用程序(可能运行数月甚至数年)永远不会遇到票证到期,最好的方法是什么?提供具体问题:

Now if I want my server application (possibly running for months or even years) to never experience ticket expiry what is the best approach? To provide concrete questions:

  1. 我可以依赖他们在需要时调用 checkTGTAndReloginFromKeytab 的各种 Hadoop 客户端吗?
  2. 我应该在我的代码中自己调用 checkTGTAndReloginFromKeytab 吗?
  3. 如果是这样,我应该在每次调用 ugi.doAs(...) 之前都这样做,还是设置一个计时器并定期(多久)调用一次?
  1. Can I rely on the various Hadoop clients they call checkTGTAndReloginFromKeytab whenever it's needed?
  2. Should I call ever checkTGTAndReloginFromKeytab myself in my code?
  3. If so should I do that before every single call to ugi.doAs(...) or rather setup a timer and call it periodically (how often)?

推荐答案

Hadoop 提交者在这里!这是一个很好的问题.

Hadoop committer here! This is an excellent question.

不幸的是,如果不深入了解应用程序的特定使用模式,就很难对此给出明确的答案.相反,我可以提供一般指南,并描述 Hadoop 何时会自动为您处理票证续订或从密钥表重新登录,以及何时不会.

Unfortunately, it's difficult to give a definitive answer to this without a deep dive into the particular usage patterns of the application. Instead, I can offer general guidelines and describe when Hadoop would handle ticket renewal or re-login from a keytab automatically for you, and when it wouldn't.

Hadoop 生态系统中 Kerberos 身份验证的主要用例是 Hadoop 的 RPC 框架,它使用 SASL 进行身份验证.Hadoop 生态系统中的大多数守护进程通过在进程启动时一次性调用 UserGroupInformation#loginUserFromKeytab 来处理此问题.这方面的示例包括 HDFS DataNode,它必须验证其对 NameNode 的 RPC 调用,以及 YARN NodeManager,它必须验证其对 ResourceManager 的调用.像 DataNode 这样的守护进程如何在进程启动时进行一次性登录,然后继续运行数月,远远超过典型的票据过期时间?

The primary use case for Kerberos authentication in the Hadoop ecosystem is Hadoop's RPC framework, which uses SASL for authentication. Most of the daemon processes in the Hadoop ecosystem handle this by doing a single one-time call to UserGroupInformation#loginUserFromKeytab at process startup. Examples of this include the HDFS DataNode, which must authenticate its RPC calls to the NameNode, and the YARN NodeManager, which must authenticate its calls to the ResourceManager. How is it that daemons like the DataNode can do a one-time login at process startup and then keep on running for months, long past typical ticket expiration times?

由于这是一个如此常见的用例,Hadoop 直接在 RPC 客户端层内部实现了自动重新登录机制.此代码在 RPC 中可见 Client#handleSaslConnectionFailure 方法:

Since this is such a common use case, Hadoop implements an automatic re-login mechanism directly inside the RPC client layer. The code for this is visible in the RPC Client#handleSaslConnectionFailure method:

          // try re-login
          if (UserGroupInformation.isLoginKeytabBased()) {
            UserGroupInformation.getLoginUser().reloginFromKeytab();
          } else if (UserGroupInformation.isLoginTicketBased()) {
            UserGroupInformation.getLoginUser().reloginFromTicketCache();
          }

您可以将其视为重新登录的懒惰评估".它只会重新执行登录以响应尝试的 RPC 连接的身份验证失败.

You can think of this as "lazy evaluation" of re-login. It only re-executes login in response to an authentication failure on an attempted RPC connection.

知道了这一点,我们就可以给出部分答案了.如果您的应用程序的使用模式是从密钥表登录,然后执行典型的 Hadoop RPC 调用,那么您可能不需要滚动自己的重新登录代码.RPC 客户端层将为您完成.典型的 Hadoop RPC"是指绝大多数用于与 Hadoop 交互的 Java API,包括 HDFS FileSystem API,YarnClient 和 MapReduce Job 提交.

Knowing this, we can give a partial answer. If your application's usage pattern is to login from a keytab and then perform typical Hadoop RPC calls, then you likely do not need to roll your own re-login code. The RPC client layer will do it for you. "Typical Hadoop RPC" means the vast majority of Java APIs for interacting with Hadoop, including the HDFS FileSystem API, the YarnClient and MapReduce Job submissions.

然而,一些应用程序使用模式根本不涉及 Hadoop RPC.这方面的一个示例是仅与 Hadoop 的 REST API 交互的应用程序,例如 WebHDFSYARN REST API.在这种情况下,身份验证模型通过 SPNEGO 使用 Kerberos,如 Hadoop HTTP 身份验证 文档.

However, some application usage patterns do not involve Hadoop RPC at all. An example of this would be applications that interact solely with Hadoop's REST APIs, such as WebHDFS or the YARN REST APIs. In that case, the authentication model uses Kerberos via SPNEGO as described in the Hadoop HTTP Authentication documentation.

知道了这一点,我们可以在答案中添加更多内容.如果您的应用程序的使用模式根本不使用 Hadoop RPC,而是仅使用 REST API,那么您必须推出自己的重新登录逻辑.这就是为什么 WebHdfsFileSystem 调用 UserGroupInformation#checkTGTAndReloginFromkeytab,就像您注意到的那样.WebHdfsFileSystem 选择在每次操作之前进行调用.这是一个很好的策略,因为 UserGroupInformation#checkTGTAndReloginFromkeytab 仅在票证接近"到期时才更新票证. 否则,调用无操作.

Knowing this, we can add more to our answer. If your application's usage pattern does not utilize Hadoop RPC at all, and instead sticks solely to the REST APIs, then you must roll your own re-login logic. This is exactly why WebHdfsFileSystem calls UserGroupInformation#checkTGTAndReloginFromkeytab, just like you noticed. WebHdfsFileSystem chooses to make the call right before every operation. This is a fine strategy, because UserGroupInformation#checkTGTAndReloginFromkeytab only renews the ticket if it's "close" to expiration. Otherwise, the call is a no-op.

作为最后一个用例,让我们考虑一个交互式过程,不是从 keytab 登录,而是要求用户在启动应用程序之前在外部运行 kinit.在绝大多数情况下,这些将是短期运行的应用程序,例如 Hadoop CLI 命令.但是,在某些情况下,这些可能是运行时间更长的进程.为了支持更长时间运行的进程,Hadoop 启动了一个后台线程来更新接近"到期的 Kerberos 票证.此逻辑在 UserGroupInformation#spawnAutoRenewalThreadForUserCreds.与 RPC 层提供的自动重新登录逻辑相比,这里有一个重要的区别.在这种情况下,Hadoop 只能更新票据并延长其生命周期.根据 Kerberos 基础结构的规定,票证具有最长的可更新生命周期.在那之后,票将不再可用.在这种情况下重新登录实际上是不可能的,因为这意味着重新提示用户输入密码,他们很可能离开了终端.这意味着如果该进程在票证到期后继续运行,它将无法再进行身份验证.

As a final use case, let's consider an interactive process, not logging in from a keytab, but rather requiring the user to run kinit externally before launching the application. In the vast majority of cases, these are going to be short-running applications, such as Hadoop CLI commands. However, in some cases these can be longer-running processes. To support longer-running processes, Hadoop starts a background thread to renew the Kerberos ticket "close" to expiration. This logic is visible in UserGroupInformation#spawnAutoRenewalThreadForUserCreds. There is an important distinction here though compared to the automatic re-login logic provided in the RPC layer. In this case, Hadoop only has the capability to renew the ticket and extend its lifetime. Tickets have a maximum renewable lifetime, as dictated by the Kerberos infrastructure. After that, the ticket won't be usable anymore. Re-login in this case is practically impossible, because it would imply re-prompting the user for a password, and they likely walked away from the terminal. This means that if the process keeps running beyond expiration of the ticket, it won't be able to authenticate anymore.

同样,我们可以使用此信息来告知我们的整体答案.如果您依赖用户在启动应用程序之前通过 kinit 交互登录,并且您确信应用程序的运行时间不会超过 Kerberos 票证的最大可更新生命周期,那么您可以依赖 Hadoop内部人员可以为您定期续订.

Again, we can use this information to inform our overall answer. If you rely on a user to login interactively via kinit before launching the application, and if you're confident the application won't run longer than the Kerberos ticket's maximum renewable lifetime, then you can rely on Hadoop internals to cover periodic renewal for you.

如果您使用的是基于 keytab 的登录,并且您只是不确定您的应用程序的使用模式是否可以依赖于 Hadoop RPC 层的自动重新登录,那么保守的方法是使用您自己的方式.@SamsonScharfrichter 在这里给出了一个关于自己动手的很好的答案.

If you're using keytab-based login, and you're just not sure if your application's usage pattern can rely on the Hadoop RPC layer's automatic re-login, then the conservative approach is to roll your own. @SamsonScharfrichter gave an excellent answer here about rolling your own.

HBase Kerberos 连接更新策略

最后,我应该添加关于 API 稳定性的注释.Apache Hadoop 兼容性 指南详细讨论了 Hadoop 开发社区对向后兼容性的承诺.UserGroupInformation 的接口标注了LimitedPrivateEvolving.从技术上讲,这意味着 UserGroupInformation 的 API 不被认为是公开的,它可以以向后不兼容的方式发展.实际上,已经有很多代码依赖于UserGroupInformation 的接口,因此我们根本无法进行重大更改.当然,在当前的 2.x 发行版中,我不会担心方法签名会从您身下改变并破坏您的代码.

Finally, I should add a note about API stability. The Apache Hadoop Compatibility guidelines discuss the Hadoop development community's commitment to backwards-compatibility in full detail. The interface of UserGroupInformation is annotated LimitedPrivate and Evolving. Technically, this means the API of UserGroupInformation is not considered public, and it could evolve in backwards-incompatible ways. As a practical matter, there is a lot of code already depending on the interface of UserGroupInformation, so it's simply not feasible for us to make a breaking change. Certainly within the current 2.x release line, I would not have any fear about method signatures changing out from under you and breaking your code.

既然我们已经掌握了所有这些背景信息,让我们重新审视您的具体问题.

Now that we have all of this background information, let's revisit your concrete questions.

我可以依赖他们在需要时调用 checkTGTAndReloginFromKeytab 的各种 Hadoop 客户端吗?

Can I rely on the various Hadoop clients they call checkTGTAndReloginFromKeytab whenever it's needed?

如果您的应用程序的使用模式是调用 Hadoop 客户端,而后者又会使用 Hadoop 的 RPC 框架,则您可以依赖它.如果您的应用程序的使用模式仅调用 Hadoop REST API,则您不能依赖于此.

You can rely on this if your application's usage pattern is to call the Hadoop clients, which in turn utilize Hadoop's RPC framework. You cannot rely on this if your application's usage pattern only calls the Hadoop REST APIs.

我应该在我的代码中自己调用 checkTGTAndReloginFromKeytab 吗?

Should I call ever checkTGTAndReloginFromKeytab myself in my code?

如果您的应用程序的使用模式只是调用 Hadoop REST API 而不是 Hadoop RPC 调用,则您可能需要这样做.您将无法从 Hadoop 的 RPC 客户端内部实现的自动重新登录中受益.

You'll likely need to do this if your application's usage pattern is solely to call the Hadoop REST APIs instead of Hadoop RPC calls. You would not get the benefit of the automatic re-login implemented inside Hadoop's RPC client.

如果是这样,我应该在每次调用 ugi.doAs(...) 之前都这样做,还是设置一个计时器并定期(多久)调用一次?

If so should I do that before every single call to ugi.doAs(...) or rather setup a timer and call it periodically (how often)?

在需要验证的每个操作之前调用 UserGroupInformation#checkTGTAndReloginFromKeytab 没问题.如果票据未接近到期,则该方法将是空操作.如果您怀疑您的 Kerberos 基础设施运行缓慢,并且您不希望客户端操作支付重新登录的延迟成本,那么这将是在单独的后台线程中执行此操作的一个理由.只要确保比票证的实际到期时间提前一点.您可以借用 UserGroupInformation 中的逻辑来确定票证是否接近"到期.在实践中,我个人从未见过重新登录的延迟有问题.

It's fine to call UserGroupInformation#checkTGTAndReloginFromKeytab right before every action that needs to be authenticated. If the ticket is not close to expiration, then the method will be a no-op. If you're suspicious that your Kerberos infrastructure is sluggish, and you don't want client operations to pay the latency cost of re-login, then that would be a reason to do it in a separate background thread. Just be sure to stay a little bit ahead of the ticket's actual expiration time. You might borrow the logic inside UserGroupInformation for determining if a ticket is "close" to expiration. In practice, I've never personally seen the latency of re-login be problematic.

这篇关于我应该在 hadoop 上的每个操作之前调用 ugi.checkTGTAndReloginFromKeytab() 吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆