java.sql.SQLException: 将 DataFrame 加载到 Spark SQL 时找不到合适的驱动程序 [英] java.sql.SQLException: No suitable driver found when loading DataFrame into Spark SQL
问题描述
我在尝试将 JDBC DataFrame 加载到 Spark SQL 时遇到了一个非常奇怪的问题.
I'm hitting very strange problem when trying to load JDBC DataFrame into Spark SQL.
我在笔记本电脑上尝试了多个 Spark 集群 - YARN、独立集群和伪分布式模式.它可以在 Spark 1.3.0 和 1.3.1 上重现.spark-shell
和使用 spark-submit
执行代码时都会出现问题.我试过 MySQL &MS SQL JDBC 驱动程序没有成功.
I've tried several Spark clusters - YARN, standalone cluster and pseudo distributed mode on my laptop. It's reproducible on both Spark 1.3.0 and 1.3.1. The problem occurs in both spark-shell
and when executing the code with spark-submit
. I've tried MySQL & MS SQL JDBC drivers without success.
考虑以下示例:
val driver = "com.mysql.jdbc.Driver"
val url = "jdbc:mysql://localhost:3306/test"
val t1 = {
sqlContext.load("jdbc", Map(
"url" -> url,
"driver" -> driver,
"dbtable" -> "t1",
"partitionColumn" -> "id",
"lowerBound" -> "0",
"upperBound" -> "100",
"numPartitions" -> "50"
))
}
到目前为止一切顺利,架构得到了正确解析:
So far so good, the schema gets resolved properly:
t1: org.apache.spark.sql.DataFrame = [id: int, name: string]
但是当我评估 DataFrame 时:
But when I evaluate DataFrame:
t1.take(1)
出现以下异常:
15/04/29 01:56:44 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.1.42): java.sql.SQLException: No suitable driver found for jdbc:mysql://<hostname>:3306/test
at java.sql.DriverManager.getConnection(DriverManager.java:689)
at java.sql.DriverManager.getConnection(DriverManager.java:270)
at org.apache.spark.sql.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:158)
at org.apache.spark.sql.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:150)
at org.apache.spark.sql.jdbc.JDBCRDD$$anon$1.<init>(JDBCRDD.scala:317)
at org.apache.spark.sql.jdbc.JDBCRDD.compute(JDBCRDD.scala:309)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
当我尝试在执行器上打开 JDBC 连接时:
When I try to open JDBC connection on executor:
import java.sql.DriverManager
sc.parallelize(0 until 2, 2).map { i =>
Class.forName(driver)
val conn = DriverManager.getConnection(url)
conn.close()
i
}.collect()
完美运行:
res1: Array[Int] = Array(0, 1)
当我在本地 Spark 上运行相同的代码时,它也能完美运行:
When I run the same code on local Spark, it works perfectly too:
scala> t1.take(1)
...
res0: Array[org.apache.spark.sql.Row] = Array([1,one])
我使用的是预构建的 Spark,支持 Hadoop 2.4.
I'm using Spark pre-built with Hadoop 2.4 support.
重现问题的最简单方法是使用 start-all.sh
脚本以伪分布式模式启动 Spark 并运行以下命令:
The easiest way to reproduce the problem is to start Spark in pseudo distributed mode with start-all.sh
script and run following command:
/path/to/spark-shell --master spark://<hostname>:7077 --jars /path/to/mysql-connector-java-5.1.35.jar --driver-class-path /path/to/mysql-connector-java-5.1.35.jar
有没有办法解决这个问题?看起来问题很严重,所以很奇怪谷歌搜索在这里没有帮助.
Is there a way to work this around? It looks like a severe problem, so it's strange that googling doesn't help here.
推荐答案
显然这个问题最近已被报告:
Apparently this issue has been recently reported:
https://issues.apache.org/jira/browse/SPARK-6913
问题出在 java.sql.DriverManager 中,它没有看到除引导类加载器以外的类加载器加载的驱动程序.
The problem is in java.sql.DriverManager that doesn't see the drivers loaded by ClassLoaders other than bootstrap ClassLoader.
作为一种临时解决方法,可以将所需的驱动程序添加到执行程序的引导类路径中.
As a temporary workaround it's possible to add required drivers to boot classpath of executors.
更新:这个拉取请求解决了这个问题:https://github.com/apache/spark/pull/5782
UPDATE: This pull request fixes the problem: https://github.com/apache/spark/pull/5782
更新 2:修复合并到 Spark 1.4
UPDATE 2: The fix merged to Spark 1.4
这篇关于java.sql.SQLException: 将 DataFrame 加载到 Spark SQL 时找不到合适的驱动程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!