星火1.5.1与蜂巢JDBC 1.2.0工作 [英] Spark 1.5.1 not working with hive jdbc 1.2.0

查看:457
本文介绍了星火1.5.1与蜂巢JDBC 1.2.0工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图用火花1.5.1在独立模式下执行蜂巢查询和分群1.2.0 JDBC版本。

I am trying to execute hive query using spark 1.5.1 in standalone mode and hive 1.2.0 jdbc version.

下面是我的一块code的:

Here is my piece of code:

private static final String HIVE_DRIVER = "org.apache.hive.jdbc.HiveDriver";
private static final String HIVE_CONNECTION_URL = "jdbc:hive2://localhost:10000/idw";
private static final SparkConf sparkconf = new SparkConf().set("spark.master", "spark://impetus-i0248u:7077").set("spark.app.name", "sparkhivesqltest")
                .set("spark.cores.max", "1").set("spark.executor.memory", "512m");

private static final JavaSparkContext sc = new JavaSparkContext(sparkconf);
private static final SQLContext sqlContext = new SQLContext(sc);
public static void main(String[] args) {                
    //Data source options
    Map<String, String> options = new HashMap<String, String>();
    options.put("driver", HIVE_DRIVER);
    options.put("url", HIVE_CONNECTION_URL);
    options.put("dbtable", "(select * from idw.emp) as employees_name");
    DataFrame jdbcDF =    sqlContext.read().format("jdbc").options(options).load();    
    }

我收到以下错误在 sqlContext.read()格式(JDBC)选项(选项).load();

在线程的异常主java.sql.SQLException中:方法不支持
    在org.apache.hive.jdbc.HiveResultSetMetaData.isSigned(HiveResultSetMetaData.java:143)

    在

Exception in thread "main" java.sql.SQLException: Method not supported at org.apache.hive.jdbc.HiveResultSetMetaData.isSigned(HiveResultSetMetaData.java:143) at

org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD $ .resolveTable(JDBCRDD.scala:135)
      在org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation(JDBCRelation.scala:91)。
      在org.apache.spark.sql.execution.datasources.jdbc.DefaultSource.createRelation(DefaultSource.scala:60)
      在org.apache.spark.sql.execution.datasources.ResolvedDataSource $。适用(ResolvedDataSource.scala:125)
      在org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)

org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:135) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:91) at org.apache.spark.sql.execution.datasources.jdbc.DefaultSource.createRelation(DefaultSource.scala:60) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:125) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)

我在单机模式下运行的火花1.5.1
Hadoop的版本是2.6
蜂巢版本是1.2​​.0

I am running spark 1.5.1 in standalone mode Hadoop version is 2.6 Hive version is 1.2.0

下面是我在Java项目pom.xml中添加依赖

Here is the dependency that I have added in java project in pom.xml

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>1.5.1</version>
</dependency>

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.10</artifactId>
    <version>1.5.1</version>
</dependency>

<dependency>
    <groupId>org.apache.hive</groupId>
    <artifactId>hive-jdbc</artifactId>
    <version>1.2.0</version>
    <exclusions>
    <exclusion>
        <groupId>javax.servlet</groupId>
        <artifactId>servlet-api</artifactId>
    </exclusion>
    </exclusions>
</dependency>

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-core</artifactId>
    <version>1.2.0</version>
</dependency>

谁能帮助我在这?
如果有人使用了火花与1.5.1 JDBC的蜂巢,那么可以请你告诉我蜂巢的兼容版本火花1.5.1。

Can anyone help me out in this? If somebody has used spark 1.5.1 with hive jdbc, then can you please tell me the compatible version of hive for spark 1.5.1.

感谢您提前..!

推荐答案

据我所知,你是不幸的运气在使用JDBC连接器,直到它的上游固定的条款; 不支持方法的在这种情况下不只是一个版本不匹配,但明确地不是在<一个实施href=\"https://github.com/apache/hive/blob/branch-1.2/jdbc/src/java/org/apache/hive/jdbc/HiveResultSetMetaData.java#L143\"相对=nofollow>蜂巢JDBC库分支1.2 的,即使你看<一个href=\"https://github.com/apache/hive/blob/master/jdbc/src/java/org/apache/hive/jdbc/HiveResultSetMetaData.java#L143\"相对=nofollow>蜂巢JDBC主分支或<一个href=\"https://github.com/apache/hive/blob/branch-2.0/jdbc/src/java/org/apache/hive/jdbc/HiveResultSetMetaData.java#L143\"相对=nofollow>分支2.0 它仍然没有实现:

As far as I can tell, you're unfortunately out of luck in terms of using the jdbc connector until it's fixed upstream; the "Method not supported" in this case is not just a version mismatch, but is explicitly not implemented in the hive jdbc library branch-1.2 and even if you look at the hive jdbc master branch or branch-2.0 it's still not implemented:

public boolean isSigned(int column) throws SQLException {
  throw new SQLException("Method not supported");
}

综观星火调用点, isSigned 在<一个在 resolveTable 被称为href=\"https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala#L135\"相对=nofollow>星火1.5 以及在<一个href=\"https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala#L137\"相对=nofollow>主。

Looking at the Spark callsite, isSigned is called during resolveTable in Spark 1.5 as well as at master.

这是说,最有可能的真正原因这个问题仍然是,当与互动蜂巢,你希望连接到蜂巢metastore直接而不需要使用JDBC接头周围一团糟;看到星火文档中的蜂巢表如何去做这个。从本质上讲,你要想想星火作为的等于的/更换蜂巢的而不是蜂巢的消费者。

That said, most likely the real reason this "issue" remains is that when interactive with Hive, you're expected to connect to the Hive metastore directly rather than needing to mess around with jdbc connectors; see the Hive Tables in Spark documentation for how to do this. Essentially, you want to think of Spark as an equal/replacement of Hive rather than being a consumer of Hive.

这样,pretty多少你要做的就是添加蜂房的site.xml 你的星火的的conf / 目录,并确保在 lib_managed /罐 DataNucleus的罐>可用于所有的Spark执行人,然后星火会谈直接向蜂巢metastore的架构信息,并直接从某种程度上适合您的HDFS获取数据很好地并行RDDS。

This way, pretty much all you do is add hive-site.xml to your Spark's conf/ directory and make sure the datanucleus jars under lib_managed/jars are available to all Spark executors, and then Spark talks directly to the Hive metastore for schema info and fetches data directly from your HDFS in a way amenable to nicely parallelized RDDs.

这篇关于星火1.5.1与蜂巢JDBC 1.2.0工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆