Spark 1.5.1 不适用于 hive jdbc 1.2.0 [英] Spark 1.5.1 not working with hive jdbc 1.2.0

查看:26
本文介绍了Spark 1.5.1 不适用于 hive jdbc 1.2.0的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在独立模式下使用 spark 1.5.1 和 hive 1.2.0 jdbc 版本执行 hive 查询.

I am trying to execute hive query using spark 1.5.1 in standalone mode and hive 1.2.0 jdbc version.

这是我的一段代码:

private static final String HIVE_DRIVER = "org.apache.hive.jdbc.HiveDriver";
private static final String HIVE_CONNECTION_URL = "jdbc:hive2://localhost:10000/idw";
private static final SparkConf sparkconf = new SparkConf().set("spark.master", "spark://impetus-i0248u:7077").set("spark.app.name", "sparkhivesqltest")
                .set("spark.cores.max", "1").set("spark.executor.memory", "512m");

private static final JavaSparkContext sc = new JavaSparkContext(sparkconf);
private static final SQLContext sqlContext = new SQLContext(sc);
public static void main(String[] args) {                
    //Data source options
    Map<String, String> options = new HashMap<String, String>();
    options.put("driver", HIVE_DRIVER);
    options.put("url", HIVE_CONNECTION_URL);
    options.put("dbtable", "(select * from idw.emp) as employees_name");
    DataFrame jdbcDF =    sqlContext.read().format("jdbc").options(options).load();    
    }

我在 sqlContext.read().format("jdbc").options(options).load();

线程main"中的异常java.sql.SQLException:不支持方法在 org.apache.hive.jdbc.HiveResultSetMetaData.isSigned(HiveResultSetMetaData.java:143)

org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:135)在 org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:91)在 org.apache.spark.sql.execution.datasources.jdbc.DefaultSource.createRelation(DefaultSource.scala:60)在 org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:125)在 org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)

org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:135) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:91) at org.apache.spark.sql.execution.datasources.jdbc.DefaultSource.createRelation(DefaultSource.scala:60) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:125) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)

我在独立模式下运行 spark 1.5.1Hadoop 版本是 2.6Hive 版本为 1.2.0

I am running spark 1.5.1 in standalone mode Hadoop version is 2.6 Hive version is 1.2.0

这里是我在java项目的pom.xml中添加的依赖

Here is the dependency that I have added in java project in pom.xml

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>1.5.1</version>
</dependency>

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.10</artifactId>
    <version>1.5.1</version>
</dependency>

<dependency>
    <groupId>org.apache.hive</groupId>
    <artifactId>hive-jdbc</artifactId>
    <version>1.2.0</version>
    <exclusions>
    <exclusion>
        <groupId>javax.servlet</groupId>
        <artifactId>servlet-api</artifactId>
    </exclusion>
    </exclusions>
</dependency>

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-core</artifactId>
    <version>1.2.0</version>
</dependency>

有人能帮我解决这个问题吗?如果有人用过spark 1.5.1和hive jdbc,请告诉我spark 1.5.1的hive兼容版本.

Can anyone help me out in this? If somebody has used spark 1.5.1 with hive jdbc, then can you please tell me the compatible version of hive for spark 1.5.1.

先谢谢你..!

推荐答案

据我所知,在 jdbc 连接器在上游得到修复之前,您很不幸地无法使用它;在这种情况下,不支持的方法"不仅仅是版本不匹配,而是在 hive jdbc library branch-1.2 并且即使您查看 hive jdbc master 分支branch-2.0 它还没有实现:

As far as I can tell, you're unfortunately out of luck in terms of using the jdbc connector until it's fixed upstream; the "Method not supported" in this case is not just a version mismatch, but is explicitly not implemented in the hive jdbc library branch-1.2 and even if you look at the hive jdbc master branch or branch-2.0 it's still not implemented:

public boolean isSigned(int column) throws SQLException {
  throw new SQLException("Method not supported");
}

查看 Spark 调用站点,在 Spark 1.5 以及 主.

Looking at the Spark callsite, isSigned is called during resolveTable in Spark 1.5 as well as at master.

也就是说,这个问题"仍然存在的真正原因很可能是,当与 Hive 交互时,您应该直接连接到 Hive 元存储,而不需要弄乱 jdbc 连接器;请参阅 Spark 中的 Hive 表 文档了解如何去做这个.本质上,您希望将 Spark 视为 Hive 的等价/替代品,而不是成为 Hive 的消费者.

That said, most likely the real reason this "issue" remains is that when interactive with Hive, you're expected to connect to the Hive metastore directly rather than needing to mess around with jdbc connectors; see the Hive Tables in Spark documentation for how to do this. Essentially, you want to think of Spark as an equal/replacement of Hive rather than being a consumer of Hive.

这样,您所做的几乎就是将 hive-site.xml 添加到 Spark 的 conf/ 目录并确保 datanucleuslib_managed/jars 下的 jar 可供所有 Spark 执行程序使用,然后 Spark 直接与 Hive 元存储对话以获取架构信息,并以适合并行化 RDD 的方式直接从 HDFS 获取数据.

This way, pretty much all you do is add hive-site.xml to your Spark's conf/ directory and make sure the datanucleus jars under lib_managed/jars are available to all Spark executors, and then Spark talks directly to the Hive metastore for schema info and fetches data directly from your HDFS in a way amenable to nicely parallelized RDDs.

这篇关于Spark 1.5.1 不适用于 hive jdbc 1.2.0的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆