找不到JDBC驱动程序-从Spark提交到YARN时 [英] JDBC Driver not found - On submitting to YARN from Spark

查看:196
本文介绍了找不到JDBC驱动程序-从Spark提交到YARN时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试从数据库表中读取所有行,并将其写入另一个空目标表.因此,当我在主节点上发出以下命令时,它会按预期工作-

Trying to read all rows from a DB table and write the same to another empty target table. So when I issue the following command at the main node, it works as expected -

$./bin/spark-submit --class cs.TestJob_publisherstarget --driver-class-path ./lib/mysql-connector-java-5.1.35-bin.jar --jars ./lib/mysql-connector-java-5.1.35-bin.jar,./lib/univocity-parsers-1.5.6.jar,./lib/commons-csv-1.1.1-SNAPSHOT.jar ./lib/uber-ski-spark-job-0.0.1-SNAPSHOT.jar

(其中:uber-ski-spark-job-0.0.1-SNAPSHOT.jar是../spark/lib文件夹中的打包jar,而cs.TestJob_publisherstarget是此类)

(Where: uber-ski-spark-job-0.0.1-SNAPSHOT.jar is the packaged jar in ../spark/lib folder and cs.TestJob_publisherstarget is the class)

上面的命令非常适合该代码,并使用--jars选项提到的JDBC驱动程序从MySQL的一个表中读取所有行,并将所有子程序转储到目标表中.

The above command works perfectly for the code and reads all rows from a table in MySQL and dumps all roes to target table, using the JDBC driver mentioned with --jars option.

一切与上面相同,当我向YARN提交同一作业时,它失败,并带有异常指示-找不到驱动程序

Everything remaining the same as above, when I submit the same job to YARN, it fails with en exception indicating - can't find the driver

$./bin/spark-submit --verbose --class cs.TestJob_publisherstarget --master yarn-cluster --driver-class-path ./lib/mysql-connector-java-5.1.35-bin.jar --jars ./lib/mysql-connector-java-5.1.35-bin.jar ./lib/uber-ski-spark-job-0.0.1-SNAPSHOT.jar

$./bin/spark-submit --verbose --class cs.TestJob_publisherstarget --master yarn-cluster --driver-class-path ./lib/mysql-connector-java-5.1.35-bin.jar --jars ./lib/mysql-connector-java-5.1.35-bin.jar ./lib/uber-ski-spark-job-0.0.1-SNAPSHOT.jar

Error: application failed with exception
org.apache.spark.SparkException: Application finished with failed status
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:625)
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:650)
    at org.apache.spark.deploy.yarn.Client.main(Client.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:577)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:174)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:197)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

记录异常:

5/10/12 20:38:59 ERROR yarn.ApplicationMaster: User class threw exception: No suitable driver found for jdbc:mysql://localhost:3306/pubs?user=root&password=root
java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost:3306/pubs?user=root&password=root
    at java.sql.DriverManager.getConnection(DriverManager.java:596)
    at java.sql.DriverManager.getConnection(DriverManager.java:187)
    at org.apache.spark.sql.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:96)
    at org.apache.spark.sql.jdbc.JDBCRelation.<init>(JDBCRelation.scala:133)
    at org.apache.spark.sql.jdbc.DefaultSource.createRelation(JDBCRelation.scala:121)
    at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:219)
    at org.apache.spark.sql.SQLContext.load(SQLContext.scala:697)
    at com.cambridgesemantics.application.sdi.compiler.spark.DataSource.getDataFrame(DataSource.scala:20)
    at cs.TestJob_publisherstarget$.main(TestJob_publisherstarget.scala:29)
    at cs.TestJob_publisherstarget.main(TestJob_publisherstarget.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:484)
15/10/12 20:38:59 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: No suitable driver found for jdbc:mysql://localhost:3306/pubs?user=root&password=root)

无论如何:我应该将JDBC驱动程序jar文件放在哪里?我已将其复制到每个子节点的lib中,仍然没有运气!

Anyway: Where am I supposed to put the JDBC driver jar file? I have copied it over to the lib of each child node, still no luck!

推荐答案

对于Spark 1.6,我遇到了使用org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable

For Spark 1.6, I have the issue to store DataFrame to Oracle by using org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable

在yarn-cluster模式下,我将以下选项放在了提交脚本中:

In yarn-cluster mode, I put these options in the submit script:

--conf "spark.driver.extraClassPath=$HOME/jdbc-11.2.0.3.0.jar" \
--conf "spark.executor.extraClassPath=$HOME/jdbc-11.2.0.3.0.jar" \

我还必须将如下所示的Class.forName("..")放在保存行之前:

I also have to put Class.forName("..") like below before the saving line:

try {
   Class.forName("oracle.jdbc.OracleDriver");
                org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable(ds, url, "RD_SPARK_DTL_INCL_HY ", p);
            } catch (Exception e) {....

当然,您必须将lib复制到每个节点.不漂亮,但是有效.希望以后有人能提出更好的解决方案.

Of course, you have to copy the lib to each node. Not pretty, but it works. Hope someone can come up better solution later.

我强烈建议您使用此API-出奇的方便和快捷.

I do strongly recommend to use this API -- amazingly convenient and fast.

这篇关于找不到JDBC驱动程序-从Spark提交到YARN时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆