值java.sql.SQLException:没有合适的驱动程序加载到数据帧星火SQL时发现 [英] java.sql.SQLException: No suitable driver found when loading DataFrame into Spark SQL
问题描述
我想JDBC数据帧装入星火SQL击球的时候很奇怪的问题。
I'm hitting very strange problem when trying to load JDBC DataFrame into Spark SQL.
我试过几个星火集群 - YARN,独立的集群和伪分布在我的笔记本电脑模式。这两个星火1.3.0和1.3.1重现性。在这两个出现该问题的火花壳
和执行code时火花提交
。我试过的MySQL和放大器; MS SQL JDBC驱动程序没有成功。
I've tried several Spark clusters - YARN, standalone cluster and pseudo distributed mode on my laptop. It's reproducible on both Spark 1.3.0 and 1.3.1. The problem occurs in both spark-shell
and when executing the code with spark-submit
. I've tried MySQL & MS SQL JDBC drivers without success.
请考虑下面的示例:
val driver = "com.mysql.jdbc.Driver"
val url = "jdbc:mysql://localhost:3306/test"
val t1 = {
sqlContext.load("jdbc", Map(
"url" -> url,
"driver" -> driver,
"dbtable" -> "t1",
"partitionColumn" -> "id",
"lowerBound" -> "0",
"upperBound" -> "100",
"numPartitions" -> "50"
))
}
到目前为止好,架构得到妥善解决:
So far so good, the schema gets resolved properly:
t1: org.apache.spark.sql.DataFrame = [id: int, name: string]
但是,当我评估数据框:
But when I evaluate DataFrame:
t1.take(1)
出现以下异常:
15/04/29 01:56:44 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.1.42): java.sql.SQLException: No suitable driver found for jdbc:mysql://<hostname>:3306/test
at java.sql.DriverManager.getConnection(DriverManager.java:689)
at java.sql.DriverManager.getConnection(DriverManager.java:270)
at org.apache.spark.sql.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:158)
at org.apache.spark.sql.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:150)
at org.apache.spark.sql.jdbc.JDBCRDD$$anon$1.<init>(JDBCRDD.scala:317)
at org.apache.spark.sql.jdbc.JDBCRDD.compute(JDBCRDD.scala:309)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
当我尝试打开执行人JDBC连接:
When I try to open JDBC connection on executor:
import java.sql.DriverManager
sc.parallelize(0 until 2, 2).map { i =>
Class.forName(driver)
val conn = DriverManager.getConnection(url)
conn.close()
i
}.collect()
它完美的作品:
res1: Array[Int] = Array(0, 1)
当我运行本地星火同一code,它完美的作品太:
When I run the same code on local Spark, it works perfectly too:
scala> t1.take(1)
...
res0: Array[org.apache.spark.sql.Row] = Array([1,one])
我使用的是星火pre-建用Hadoop 2.4的支持。
I'm using Spark pre-built with Hadoop 2.4 support.
重现该问题的最简单方法是启动Spark在伪分布式模式与 start-all.sh
脚本并运行以下命令:
The easiest way to reproduce the problem is to start Spark in pseudo distributed mode with start-all.sh
script and run following command:
/path/to/spark-shell --master spark://<hostname>:7077 --jars /path/to/mysql-connector-java-5.1.35.jar --driver-class-path /path/to/mysql-connector-java-5.1.35.jar
有没有办法来解决这个身边?它看起来像一个严重的问题,所以这是奇怪的是,谷歌搜索也没有任何帮助。
Is there a way to work this around? It looks like a severe problem, so it's strange that googling doesn't help here.
推荐答案
显然,这个问题最近有报道:
Apparently this issue has been recently reported:
https://issues.apache.org/jira/browse/SPARK-6913
问题是java.sql.DriverManager中没有看到比引导类加载器等类加载器加载的驱动程序。
The problem is in java.sql.DriverManager that doesn't see the drivers loaded by ClassLoaders other than bootstrap ClassLoader.
作为临时解决办法有可能添加所需的驱动程序来启动执行人的类路径中。
As a temporary workaround it's possible to add required drivers to boot classpath of executors.
更新:此拉请求解决了这个问题: https://github.com/apache/spark /拉/ 5782
UPDATE: This pull request fixes the problem: https://github.com/apache/spark/pull/5782
更新2:此修复程序合并为星火1.4
UPDATE 2: The fix merged to Spark 1.4
这篇关于值java.sql.SQLException:没有合适的驱动程序加载到数据帧星火SQL时发现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!