在EMR上的Spark中添加JDBC驱动程序 [英] Adding JDBC driver to Spark on EMR
问题描述
我试图将JDBC驱动程序添加到在顶级Amazon EMR上执行的Spark集群中,但我一直得到:
I'm trying to add a JDBC driver to a Spark cluster that is executing on top Amazon EMR but I keep getting the:
java.sql.SQLException:未找到合适的异常驱动程序.
java.sql.SQLException: No suitable driver found for exception.
我尝试了以下操作:
- 使用addJar从代码中显式添加驱动程序Jar.
- 使用spark.executor.extraClassPath spark.driver.extraClassPath参数.
- 使用spark.driver.userClassPathFirst = true,当我使用此选项时,由于依赖关系与Spark混合使用,我得到了另一个错误,无论如何,如果我只想添加一个JAR,则此选项似乎具有攻击性. /li>
- Use addJar to add the driver Jar explicitly from the code.
- Using spark.executor.extraClassPath spark.driver.extraClassPath parameters.
- Using spark.driver.userClassPathFirst=true, when I used this option I'm getting a different error because mix of dependencies with Spark, Anyway this option seems to be to aggressive if I just want to add a single JAR.
请您帮我一下,如何将驱动程序轻松引入Spark集群?
Could you please help me with that,how can I introduce the driver to the Spark cluster easily?
谢谢
大卫
应用程序的源代码
val properties = new Properties()
properties.put("ssl", "***")
properties.put("user", "***")
properties.put("password", "***")
properties.put("account", "***")
properties.put("db", "***")
properties.put("schema", "***")
properties.put("driver", "***")
val conf = new SparkConf().setAppName("***")
.setMaster("yarn-cluster")
.setJars(JavaSparkContext.jarOfClass(this.getClass()))
val sc = new SparkContext(conf)
sc.addJar(args(0))
val sqlContext = new SQLContext(sc)
var df = sqlContext.read.jdbc(connectStr, "***", properties = properties)
df = df.select( Constants.***,
Constants.***,
Constants.***,
Constants.***,
Constants.***,
Constants.***,
Constants.***,
Constants.***,
Constants.***)
// Additional actions on df
推荐答案
我遇到了同样的问题.对我来说结束的工作是使用与spark-submit一起使用的--driver-class-path参数.
I had the same problem. What ended working for me is to use the --driver-class-path parameter used with spark-submit.
主要是将整个spark类路径添加到--driver-class-path
The main thing is to add the entire spark class path to the --driver-class-path
这是我的步骤:
- 我通过获取 Spark历史记录服务器中的"spark.driver.extraClassPath"属性 在环境"下.
- 将MySQL JAR文件复制到EMR集群中的每个节点.
- 将MySQL jar路径放在--driver-class-path参数的最前面,并添加到spark-submit命令,并将"spark.driver.extraClassPath"的值附加到该路径上
- I got the default driver class path by getting the value of the "spark.driver.extraClassPath" property from the Spark History Server under "Environment".
- Copied the MySQL JAR file to each node in the EMR cluster.
- Put the MySQL jar path at the front of the --driver-class-path argument to the spark-submit command and append the value of "spark.driver.extraClassPath" to it
我的驱动程序类路径最终看起来像这样:
My driver class path ended up looking like this:
-驱动程序类路径/home/hadoop/jars/mysql-connector-java-5.1.35.jar:/etc/hadoop/conf:/usr/lib/hadoop/:/usr/lib/hadoop-hdfs/:/usr/lib/hadoop-mapreduce/:/usr/lib/hadoop-yarn/:/usr/lib/hadoop-lzo/lib/:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/:/usr/share/aws/emr/emrfs/auxlib/* >
--driver-class-path /home/hadoop/jars/mysql-connector-java-5.1.35.jar:/etc/hadoop/conf:/usr/lib/hadoop/:/usr/lib/hadoop-hdfs/:/usr/lib/hadoop-mapreduce/:/usr/lib/hadoop-yarn/:/usr/lib/hadoop-lzo/lib/:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/:/usr/share/aws/emr/emrfs/auxlib/*
这与使用Java和Spark 1.5.0的EMR 4.1一起使用. 我已经在Maven pom.xml中将MySQL JAR添加为依赖项
This worked with EMR 4.1 using Java with Spark 1.5.0. I had already added the MySQL JAR as a dependency in the Maven pom.xml
You may also want to look at this answer as it seems like a cleaner solution. I haven't tried it myself.
这篇关于在EMR上的Spark中添加JDBC驱动程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!