java.sql.SQLException: 将 DataFrame 加载到 Spark SQL 时找不到合适的驱动程序 [英] java.sql.SQLException: No suitable driver found when loading DataFrame into Spark SQL

查看：30 发布时间：2021/11/14 21:56:10 scala jdbc apache-spark apache-spark-sql

本文介绍了java.sql.SQLException: 将 DataFrame 加载到 Spark SQL 时找不到合适的驱动程序的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在尝试将 JDBC DataFrame 加载到 Spark SQL 时遇到了一个非常奇怪的问题.

I'm hitting very strange problem when trying to load JDBC DataFrame into Spark SQL.

我在笔记本电脑上尝试了多个 Spark 集群 - YARN、独立集群和伪分布式模式.它可以在 Spark 1.3.0 和 1.3.1 上重现.spark-shell 和使用 spark-submit 执行代码时都会出现问题.我试过 MySQL &MS SQL JDBC 驱动程序没有成功.

I've tried several Spark clusters - YARN, standalone cluster and pseudo distributed mode on my laptop. It's reproducible on both Spark 1.3.0 and 1.3.1. The problem occurs in both spark-shell and when executing the code with spark-submit. I've tried MySQL & MS SQL JDBC drivers without success.

考虑以下示例:

val driver = "com.mysql.jdbc.Driver"
val url = "jdbc:mysql://localhost:3306/test"

val t1 = {
  sqlContext.load("jdbc", Map(
    "url" -> url,
    "driver" -> driver,
    "dbtable" -> "t1",
    "partitionColumn" -> "id",
    "lowerBound" -> "0",
    "upperBound" -> "100",
    "numPartitions" -> "50"
  ))
}

到目前为止一切顺利，架构得到了正确解析:

So far so good, the schema gets resolved properly:

t1: org.apache.spark.sql.DataFrame = [id: int, name: string]

但是当我评估 DataFrame 时:

But when I evaluate DataFrame:

t1.take(1)

出现以下异常:

15/04/29 01:56:44 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.1.42): java.sql.SQLException: No suitable driver found for jdbc:mysql://<hostname>:3306/test
    at java.sql.DriverManager.getConnection(DriverManager.java:689)
    at java.sql.DriverManager.getConnection(DriverManager.java:270)
    at org.apache.spark.sql.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:158)
    at org.apache.spark.sql.jdbc.JDBCRDD$$anonfun$getConnector$1.apply(JDBCRDD.scala:150)
    at org.apache.spark.sql.jdbc.JDBCRDD$$anon$1.<init>(JDBCRDD.scala:317)
    at org.apache.spark.sql.jdbc.JDBCRDD.compute(JDBCRDD.scala:309)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:64)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

当我尝试在执行器上打开 JDBC 连接时:

When I try to open JDBC connection on executor:

import java.sql.DriverManager

sc.parallelize(0 until 2, 2).map { i =>
  Class.forName(driver)
  val conn = DriverManager.getConnection(url)
  conn.close()
  i
}.collect()

完美运行:

res1: Array[Int] = Array(0, 1)

当我在本地 Spark 上运行相同的代码时，它也能完美运行:

When I run the same code on local Spark, it works perfectly too:

scala> t1.take(1)
...
res0: Array[org.apache.spark.sql.Row] = Array([1,one])

我使用的是预构建的 Spark，支持 Hadoop 2.4.

I'm using Spark pre-built with Hadoop 2.4 support.

重现问题的最简单方法是使用 start-all.sh 脚本以伪分布式模式启动 Spark 并运行以下命令:

The easiest way to reproduce the problem is to start Spark in pseudo distributed mode with start-all.sh script and run following command:

/path/to/spark-shell --master spark://<hostname>:7077 --jars /path/to/mysql-connector-java-5.1.35.jar --driver-class-path /path/to/mysql-connector-java-5.1.35.jar

有没有办法解决这个问题?看起来问题很严重，所以很奇怪谷歌搜索在这里没有帮助.

Is there a way to work this around? It looks like a severe problem, so it's strange that googling doesn't help here.

java.sql.SQLException: 将 DataFrame 加载到 Spark SQL 时找不到合适的驱动程序 [英] java.sql.SQLException: No suitable driver found when loading DataFrame into Spark SQL

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

java.sql.SQLException: 将 DataFrame 加载到 Spark SQL 时找不到合适的驱动程序 [英] java.sql.SQLException: No suitable driver found when loading DataFrame into Spark SQL

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭