在EMR上的Spark中添加JDBC驱动程序 [英] Adding JDBC driver to Spark on EMR

查看:201
本文介绍了在EMR上的Spark中添加JDBC驱动程序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图将JDBC驱动程序添加到在顶级Amazon EMR上执行的Spark集群中,但我一直得到:

I'm trying to add a JDBC driver to a Spark cluster that is executing on top Amazon EMR but I keep getting the:

java.sql.SQLException:未找到合适的异常驱动程序.

java.sql.SQLException: No suitable driver found for exception.

我尝试了以下操作:

  1. 使用addJar从代码中显式添加驱动程序Jar.
  2. 使用spark.executor.extraClassPath spark.driver.extraClassPath参数.
  3. 使用spark.driver.userClassPathFirst = true,当我使用此选项时,由于依赖关系与Spark混合使用,我得到了另一个错误,无论如何,如果我只想添加一个JAR,则此选项似乎具有攻击性. /li>
  1. Use addJar to add the driver Jar explicitly from the code.
  2. Using spark.executor.extraClassPath spark.driver.extraClassPath parameters.
  3. Using spark.driver.userClassPathFirst=true, when I used this option I'm getting a different error because mix of dependencies with Spark, Anyway this option seems to be to aggressive if I just want to add a single JAR.

请您帮我一下,如何将驱动程序轻松引入Spark集群?

Could you please help me with that,how can I introduce the driver to the Spark cluster easily?

谢谢

大卫

应用程序的源代码

val properties = new Properties()
properties.put("ssl", "***")
properties.put("user", "***")
properties.put("password", "***")
properties.put("account", "***")
properties.put("db", "***")
properties.put("schema", "***")
properties.put("driver", "***")

val conf = new SparkConf().setAppName("***")
      .setMaster("yarn-cluster")
      .setJars(JavaSparkContext.jarOfClass(this.getClass()))

val sc = new SparkContext(conf)
sc.addJar(args(0))
val sqlContext = new SQLContext(sc)

var df = sqlContext.read.jdbc(connectStr, "***", properties = properties)
df = df.select( Constants.***,
                Constants.***,
                Constants.***,
                Constants.***,
                Constants.***,
                Constants.***,
                Constants.***,
                Constants.***,
                Constants.***)
// Additional actions on df

推荐答案

我遇到了同样的问题.对我来说结束的工作是使用与spark-submit一起使用的--driver-class-path参数.

I had the same problem. What ended working for me is to use the --driver-class-path parameter used with spark-submit.

主要是将整个spark类路径添加到--driver-class-path

The main thing is to add the entire spark class path to the --driver-class-path

这是我的步骤:

  1. 我通过获取 Spark历史记录服务器中的"spark.driver.extraClassPath"属性 在环境"下.
  2. 将MySQL JAR文件复制到EMR集群中的每个节点.
  3. 将MySQL jar路径放在--driver-class-path参数的最前面,并添加到spark-submit命令,并将"spark.driver.extraClassPath"的值附加到该路径上
  1. I got the default driver class path by getting the value of the "spark.driver.extraClassPath" property from the Spark History Server under "Environment".
  2. Copied the MySQL JAR file to each node in the EMR cluster.
  3. Put the MySQL jar path at the front of the --driver-class-path argument to the spark-submit command and append the value of "spark.driver.extraClassPath" to it

我的驱动程序类路径最终看起来像这样:

My driver class path ended up looking like this:

-驱动程序类路径/home/hadoop/jars/mysql-connector-java-5.1.35.jar:/etc/hadoop/conf:/usr/lib/hadoop/:/usr/lib/hadoop-hdfs/:/usr/lib/hadoop-mapreduce/:/usr/lib/hadoop-yarn/:/usr/lib/hadoop-lzo/lib/:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/:/usr/share/aws/emr/emrfs/auxlib/*

--driver-class-path /home/hadoop/jars/mysql-connector-java-5.1.35.jar:/etc/hadoop/conf:/usr/lib/hadoop/:/usr/lib/hadoop-hdfs/:/usr/lib/hadoop-mapreduce/:/usr/lib/hadoop-yarn/:/usr/lib/hadoop-lzo/lib/:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/:/usr/share/aws/emr/emrfs/auxlib/*

这与使用Java和Spark 1.5.0的EMR 4.1一起使用. 我已经在Maven pom.xml中将MySQL JAR添加为依赖项

This worked with EMR 4.1 using Java with Spark 1.5.0. I had already added the MySQL JAR as a dependency in the Maven pom.xml

您可能还希望查看

You may also want to look at this answer as it seems like a cleaner solution. I haven't tried it myself.

这篇关于在EMR上的Spark中添加JDBC驱动程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆