无法使用JDBC将Spark数据集写入数据库 [英] Not able to write spark dataset to database using jdbc
问题描述
我需要将spark数据集写入oracle数据库表.我正在使用带有追加模式的数据集写入方法.但是要获得分析异常, 当使用spark2-submit命令在集群上触发spark作业时.
I need to write my spark dataset to oracle database table. I am using dataset write method with append mode. But getting analysis exception, when the spark job was triggered on cluster using spark2-submit command.
我已经阅读了json文件,将其展平并设置为abcDataset作为数据集.
I have read the json file, flattened it and set into a dataset as abcDataset.
火花版本-2 甲骨文数据库 JDBC驱动程序-oracle.jdbc.driver.OracleDriver 编程语言-Java
Spark Version - 2 Oracle Database JDBC Driver - oracle.jdbc.driver.OracleDriver Programming Language - Java
Dataset<Row> abcDataset= dataframe.select(col('abc').....{and other columns};
Properties dbProperties = new Properties();
InputStream is = SparkReader.class.getClassLoader().getResourceAsStream("dbProperties.yaml");
dbProperties.load(is);
String jdbcUrl = dbProperties.getProperty("jdbcUrl");
dbProperties.put("driver","oracle.jdbc.driver.OracleDriver");
String where = "USER123.PERSON";
abcDataset.write().format("org.apache.spark.sql.execution.datasources.jdbc.DefaultSource").option("driver", "oracle.jdbc.driver.OracleDriver").mode("append").jdbc(jdbcUrl, where, dbProperties);
预期-写入数据库但出现以下错误-
Expected - to write into database but getting the error below -
org.apache.spark.sql.AnalysisException: Multiple sources found for jdbc (org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider, org.apache.spark.sql.execution.datasources.jdbc.DefaultSource), please specify the fully qualified class name.;
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:670)
当我在集群上运行此命令时,是否需要在spark Submit命令中设置任何其他属性?或者缺少任何步骤?
Do we need to set any additional property in spark submit command, as i am running this on cluster, or any step is missing ?
推荐答案
当您通过jdbc从Spark写入rdbms时,需要使用abcDataset.write.jdbc或abcDataset.write.format("jdbc").
You need to use either abcDataset.write.jdbc or abcDataset.write.format("jdbc") when you are writing via jdbc from spark to rdbms.
这篇关于无法使用JDBC将Spark数据集写入数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!