使用RJDBC/RHive从R连接到Remote Hive Server [英] connect to Remote Hive Server from R using RJDBC/RHive

查看:202
本文介绍了使用RJDBC/RHive从R连接到Remote Hive Server的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用RJDBC 0.2-5连接到Rstudio中的Hive.我的服务器有hadoop-2.4.1和hive-0.14.我按照下面提到的步骤连接到Hive.

I'm using RJDBC 0.2-5 to connect to Hive in Rstudio. My server has hadoop-2.4.1 and hive-0.14. I follow the below mention steps to connect to Hive.

library(DBI)
library(rJava)
library(RJDBC)
.jinit(parameters="-DrJava.debug=true")
drv <- JDBC("org.apache.hadoop.hive.jdbc.HiveDriver", 
            c("/home/packages/hive/New folder3/commons-logging-1.1.3.jar",
              "/home/packages/hive/New folder3/hive-jdbc-0.14.0.jar",
              "/home/packages/hive/New folder3/hive-metastore-0.14.0.jar",
              "/home/packages/hive/New folder3/hive-service-0.14.0.jar",
              "/home/packages/hive/New folder3/libfb303-0.9.0.jar",
              "/home/packages/hive/New folder3/libthrift-0.9.0.jar",
              "/home/packages/hive/New folder3/log4j-1.2.16.jar",
              "/home/packages/hive/New folder3/slf4j-api-1.7.5.jar",
              "/home/packages/hive/New folder3/slf4j-log4j12-1.7.5.jar",
              "/home/packages/hive/New folder3/hive-common-0.14.0.jar",
            "/home/packages/hive/New folder3/hadoop-core-0.20.2.jar",
            "/home/packages/hive/New folder3/hive-serde-0.14.0.jar",
             "/home/packages/hive/New folder3/hadoop-common-2.4.1.jar"),
            identifier.quote="`")

conHive <- dbConnect(drv, "jdbc:hive://myserver:10000/default",
                  "usr",
                  "pwd")

但是我总是遇到以下错误:

But I am always getting the following error:

.jcall(drv @ jdrv,"Ljava/sql/Connection;","connect", as.character(url)[1] ,: java.lang.NoClassDefFoundError:无法 初始化org.apache.hadoop.hive.conf.HiveConf $ ConfVars类

Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], : java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hive.conf.HiveConf$ConfVars

即使我尝试使用不同版本的Hive jar,Hive-jdbc-standalone.jar,但似乎也无济于事..我也使用RHive连接到Hive,但也没有成功.

Even I tried with different version of Hive jar, Hive-jdbc-standalone.jar but nothing seems to work.. I also use RHive to connect to Hive but there was also no success.

有人可以帮助我吗?..我有点被困住了:(

Can anyone help me?.. I kind of stuck :(

推荐答案

我没有尝试rHive,因为它似乎需要在群集的所有节点上进行复杂的安装.

I didn't try rHive because it seems to need a complex installation on all the nodes of the cluster.

我使用RJDBC成功连接到Hive,下面是可在我的Hadoop 2.6 CDH5.4集群上工作的代码段:

I successfully connect to Hive using RJDBC, here are a code snipet that works on my Hadoop 2.6 CDH5.4 cluster :

#loading libraries
library("DBI")
library("rJava")
library("RJDBC")

#init of the classpath (works with hadoop 2.6 on CDH 5.4 installation)
cp = c("/usr/lib/hive/lib/hive-jdbc.jar", "/usr/lib/hadoop/client/hadoop-common.jar", "/usr/lib/hive/lib/libthrift-0.9.2.jar", "/usr/lib/hive/lib/hive-service.jar", "/usr/lib/hive/lib/httpclient-4.2.5.jar", "/usr/lib/hive/lib/httpcore-4.2.5.jar", "/usr/lib/hive/lib/hive-jdbc-standalone.jar")
.jinit(classpath=cp)

#initialisation de la connexion
drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "/usr/lib/hive/lib/hive-jdbc.jar", identifier.quote="`")
conn <- dbConnect(drv, "jdbc:hive2://localhost:10000/mydb", "myuser", "")

#working with the connexion
show_databases <- dbGetQuery(conn, "show databases")
show_databases

更难的是找到所有需要的罐子以及在哪里找到它们...

The harder is to find all the needs jars and where to find them ...

更新 hive独立JAR包含使用Hive所需的全部内容,将此独立JAR与hadoop通用jar一起使用就足以使用Hive.

UPDATE The hive standalone JAR contains all that was needed to use Hive, using this standalone JAR with the hadoop-common jar is enough to use Hive.

因此,这是简化版本,无需担心hadoop常见的jar和hive-standalone的jar.

So this is a simplified version, no need to worry to other jars that the hadoop-common and the hive-standalone jars.

 #loading libraries
 library("DBI")
 library("rJava")
 library("RJDBC")

 #init of the classpath (works with hadoop 2.6 on CDH 5.4 installation)
 cp = c("/usr/lib/hadoop/client/hadoop-common.jar", "/usr/lib/hive/lib/hive-jdbc-standalone.jar")
 .jinit(classpath=cp)

 #initialisation de la connexion
 drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "/usr/lib/hive/lib/hive-jdbc-standalone.jar", identifier.quote="`")
 conn <- dbConnect(drv, "jdbc:hive2://localhost:10000/mydb", "myuser", "")

 #working with the connexion
 show_databases <- dbGetQuery(conn, "show databases")
 show_databases

这篇关于使用RJDBC/RHive从R连接到Remote Hive Server的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆