SQLException 上的 sqlContext HiveDriver 错误:不支持方法 [英] sqlContext HiveDriver error on SQLException: Method not supported

查看:20
本文介绍了SQLException 上的 sqlContext HiveDriver 错误:不支持方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试使用 sqlContext.read.format("jdbc").options(driver="org.apache.hive.jdbc.HiveDriver") 将 Hive 表放入 Spark 而无需任何成功.我已经完成研究并阅读以下内容:

I have been trying to use sqlContext.read.format("jdbc").options(driver="org.apache.hive.jdbc.HiveDriver") to get Hive table into Spark without any success. I have done research and read below:

如何从 spark 连接到远程 hive 服务器

Spark 1.5.1 不使用 hive jdbc 1.2.0

http://belablotski.blogspot.在/2016/01/access-hive-tables-from-spark-using.html

我使用了最新的 Hortonworks Sandbox 2.6,并向那里的社区提出了同样的问题:

I used the latest Hortonworks Sandbox 2.6 and asked the community there the same question:

https://community.hortonworks.com/questions/156828/pyspark-jdbc-py4jjavaerror-calling-o95load-javasql.html?childToView=156936#answer-156936

我想做的很简单,通过pyspark:

What I want to do is very simple via pyspark:

df = sqlContext.read.format("jdbc").options(driver="org.apache.hive.jdbc.HiveDriver", url="jdbc:hive2://localhost:10016/default", dbtable="sample_07",user="maria_dev", password="maria_dev").load()

这给了我这个错误:

17/12/30 19:55:14 INFO HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://localhost:10016/default
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/hdp/current/spark-client/python/pyspark/sql/readwriter.py", line 139, in load
    return self._df(self._jreader.load())
  File "/usr/hdp/current/spark-client/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
  File "/usr/hdp/current/spark-client/python/pyspark/sql/utils.py", line 45, in deco
    return f(*a, **kw)
  File "/usr/hdp/current/spark-client/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o119.load.
: java.sql.SQLException: Method not supported
at org.apache.hive.jdbc.HiveResultSetMetaData.isSigned(HiveResultSetMetaData.java:143)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:136)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:91)
at org.apache.spark.sql.execution.datasources.jdbc.DefaultSource.createRelation(DefaultSource.scala:57)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:748)

使用直线,它工作正常

beeline> !connect jdbc:hive2://localhost:10016/default maria_dev maria_dev
Connecting to jdbc:hive2://localhost:10016/default
Connected to: Spark SQL (version 2.1.1.2.6.1.0-129)
Driver: Hive JDBC (version 1.2.1000.2.6.1.0-129)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10016/default> select * from sample_07 limit 2;
+----------+-------------------------+------------+---------+--+
|   code   |       description       | total_emp  | salary  |
+----------+-------------------------+------------+---------+--+
| 00-0000  | All Occupations         | 134354250  | 40690   |
| 11-0000  | Management occupations  | 6003930    | 96150   |
+----------+-------------------------+------------+---------+--+

我也可以这样做:

spark = SparkSession.Builder().appName("testapp").enableHiveSupport().‌​getOrCreate()
spark.sql("select * from default.sample_07").collect()

但这会直接读入 Hive 元数据.我想使用 JDBC 到 Spark Thrift Server 以获得细粒度的安全性.

But this reads into Hive Metadata directly. I would like to use JDBC to Spark Thrift Server for fine-grained security.

我可以这样做 PostgreSQL:

I could do PostgreSQL like so:

sqlContext.read.format("jdbc").options(driver="org.postgresql.Driver")

我还可以使用 Scala java.sql.{DriverManager, Connection, Statement, ResultSet} 来创建 JDBC Connection 作为客户端以访问 Spark.但这基本上是将所有数据放入内存,然后手动重新创建 Dataframe.

I could also use Scala java.sql.{DriverManager, Connection, Statement, ResultSet} to create JDBC Connection as a client side to get to Spark. But that basically puts all data into memory and then re-create Dataframe manually.

所以问题是:有没有一种方法可以使用 Hive 表数据创建 Spark 数据帧,而无需将数据加载到 Scala 等 JDBC 客户端的内存中,而不是像上面的示例那样使用 SparkSession.Builder() ?我的用例是我需要处理细粒度的安全性.

So the question is: Is there a way to create Spark dataframe with Hive table data without loading data into memory into JDBC client like Scala and not use SparkSession.Builder() like examples above? My use case is that I need to deal with fine-grained security.

推荐答案

我不确定我是否正确理解您的问题,但据我所知,您需要将 hive 表放入数据框,为此您不需要 JDBC 连接,在您的示例链接中,它们试图连接到不同的数据库 (RDBMS),而不是 Hive.

I'm not sure if I understand your question correctly or not, But from what I understand you will need to get a hive table into data frame, for that you don't need to have the JDBC connection, in your example links they are trying to connect to different databases (RDBMS), not Hive.

请参阅以下方法,使用 hive 上下文您可以将表放入数据框中.

Please see the below approach, using hive context you can get the table into a data frame.

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.{DataFrame, SQLContext}

def main(args: Array[String]): Unit = {

val sparkConf = new SparkConf().setAppName("APPName")
    val sc = new SparkContext(sparkConf)
    val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
    val sqlContext = new SQLContext(sc)

val hive_df = hiveContext.sql("select * from schema.table").first()

//other way
// val hive_df= hiveContext.table ("SchemaName.TableName")

//Below will print the first line
df.first()
//count on dataframe
df.count()

}

如果您真的想使用 JDBC 连接,我有以下用于 Oracle 数据库的示例,可能会对您有所帮助.

If you really want to use the JDBC connection I have the below example that I used for Oracle database, which might help you.

val oracle_data = sqlContext.load("jdbc", Map("url" -> "jdbc:oracle:thin:username/password//hostname:2134/databaseName", "dbtable" -> "Your query tmp", "driver" -> "oracle.jdbc.driver.OracleDriver"));

这篇关于SQLException 上的 sqlContext HiveDriver 错误:不支持方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆