使用Kerberos将RStudio(但不是R)连接到Hive时出现问题 [英] Issue connecting RStudio (but not R) to Hive with Kerberos

查看:656
本文介绍了使用Kerberos将RStudio(但不是R)连接到Hive时出现问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将RStudio连接到具有Kerberos身份验证的Hive.如果我在从命令行调用的R脚本中运行以下命令,它将起作用.

I'me trying to connect RStudio to Hive that has Kerberos authentication. If I run the below in an R script called from the command line, it works.

library("DBI")
library("rJava")
library("RJDBC")

cp = c("/u01/cloudera/parcels/CDH/lib/hive/lib/hive-jdbc.jar"
, "/u01/cloudera/parcels/CDH/lib/hadoop/hadoop-common.jar"
, "/u01/cloudera/parcels/CDH/lib/hive/lib/libthrift-0.9.2.jar"
, "/u01/cloudera/parcels/CDH/lib/hive/lib/hive-service.jar"
, "/u01/cloudera/parcels/CDH/lib/hive/lib/httpclient-4.2.5.jar"
, "/u01/cloudera/parcels/CDH/lib/hive/lib/httpcore-4.2.5.jar"
, "/u01/cloudera/parcels/CDH/lib/hive/lib/hive-jdbc-standalone.jar")
.jinit(classpath=cp)

drv <- JDBC("org.apache.hive.jdbc.HiveDriver" , "hive-jdbc.jar" )

conn <- dbConnect(drv , "jdbc:hive2://XXXX:10000/default;principal=hive/XXXX@XXXXX";auth-kerberos)

如果我在RStudio中运行完全相同的脚本,则会收到错误消息:

If I run the exact same script in RStudio, I get an error:

javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

如果我在RStudio中运行system('klist'),则表明我有一个有效的凭单. RStudio似乎无法识别票证,但R可以.有什么想法吗?

If I run system('klist') in RStudio, it shows I have a valid ticket. It seems RStudio isn't able to identify the ticket but R is. Any ideas?

推荐答案

一些无聊的东西,首先是将它们放到上下文中,然后是解决方案.

Some boring stuff first, to put things into context, then the solution.

  • Kerberos:即使不考虑微软有自己的实现和扩展,它也会自然地变得复杂( network )
  • Java和Kerberos:它甚至更复杂(仅部分支持,Java版本中的细微更改等)
  • Hadoop和Java和Kerberos:它复杂且难看()(如果您真的想失去理智,请阅读GitBook"Hadoop和Kerberos,超越大门的疯狂"),在Windows上更糟cf.缺少所需的Hadoop本机库"的正式版本
  • Hive以及JDBC和Kerberos:好消息是,您不需要Hadoop的丑陋"部分除非,您在Windows 上使用Apache JDBC驱动程序(提示:抛弃它并选择Cloudera JDBC驱动程序!);坏消息是您可能需要原始的JAAS配置和特定的Java系统属性
  • R和Java/JDBC:它运行得很好,除了有时您想要在启动时或运行时将特定的Java系统属性传递给JVM,但.jinit不支持该AFAIK,您必须采取解决方法
  • Kerberos: it's complicated by nature (think cryptography network), even without considering that Microsoft has its own implementation and extensions
  • Java and Kerberos: it's even more complicated (only partial support, subtle changes in Java versions, etc.)
  • Hadoop and Java and Kerberos: it's complicated and ugly (read the GitBook "Hadoop and Kerberos, the Madness beyond the Gate" if you really want to lose your sanity) and it's even worse on Windows cf. lack of an official build for the required Hadoop "native libs"
  • Hive and JDBC and Kerberos: the good news is that you don't need the Hadoop "ugly" part unless you are using the Apache JDBC driver on Windows (hint: ditch it and opt for the Cloudera JDBC driver!); the bad news is that you may need raw JAAS configuration and specific Java system properties
  • R and Java/JDBC: it works quite well, except that sometimes you want to pass specific Java system properties to the JVM -- either at launch time or at run time -- but .jinit does not support that AFAIK, you must resort to a workaround


必须设置一个 Java系统属性,Kerberos身份验证才能在JDBC中工作,并且默认情况下并非总是如此.
但是您不能直接从R设置Java属性.您必须设置环境变量 (在启动R之前,或从R代码开始,但在.jinit之前)


There is one Java system property that must be set for Kerberos auth to work in JDBC, and it's not always set by default.
But you can't set that Java property from R directly; you have to set an environment variable (either before starting R, or from R code but before .jinit)

选项1:

Option 1: from a Linux shell script, before starting R...

export JAVA_TOOL_OPTIONS="-Djavax.security.auth.useSubjectCredsOnly‌​=false"

R代码中的

选项2: ...

Option 2: from your R code...

Sys.setenv(JAVA_TOOL_OPTIONS="-Djavax.security.auth.useSubjectCredsOnly‌​=false")
.jinit(...)


现在,在所有情况下可能还不够.也许您需要使用特定的Kerberos配置,因为您的Hadoop集群使用其自己的KDC.也许您不想使用默认的Kerberos票证,而是使用密钥表文件中存储的密码将其验证为服务帐户.
也许您需要一些调试信息,因为,好吧,(而且安全库在默认情况下是相当秘密的,我想并不是为了让黑客太过容易...)


Now, that may not be sufficient in all cases. Maybe you need to use a specific Kerberos config because your Hadoop cluster uses its own KDC. Maybe you don't want to use the default Kerberos ticket, but instead authenticate as a service account, using a password stored in a keytab file.
And maybe you need some debugging information because, well, shit happens (and security libraries are quite secretive by default, not to make things too easy for hackers, I suppose...)

请参阅该帖子的 有关使用Kerberos的Hive/Impala JDBC的高级Java配置的详细信息.

Please refer to that post for more information about advanced Java configuration for Hive/Impala JDBC with Kerberos.

在设置环境变量时要小心:模拟Java命令行,即-Dsome.key=value -Dsome.other.key=blahblah;在shell脚本中,请使用引号(因为要有空格);在R代码中,请使用单个字符串,而不使用数组.

And be careful when setting the environment variable: simulate a Java command-line i.e. -Dsome.key=value -Dsome.other.key=blahblah; in shell script, use quotes (because of the separating space); in R code, use a single string, no array.

这篇关于使用Kerberos将RStudio(但不是R)连接到Hive时出现问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆