使用Spark 2.0.1将数据写入Redshift [英] Write data to Redshift using Spark 2.0.1
问题描述
我正在做一个POC,我想在其中向Redshift写一些简单的数据.
I am doing a POC, where I want to write some simple data set to Redshift.
我有以下sbt文件:
name := "Spark_POC"
version := "1.0"
scalaVersion := "2.10.6"
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "2.0.1"
libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "2.0.1"
resolvers += "jitpack" at "https://jitpack.io"
libraryDependencies += "com.databricks" %% "spark-redshift" % "3.0.0-preview1"
和以下代码:
object Main extends App{
val conf = new SparkConf().setAppName("Hello World").setMaster("local[2]")
System.setProperty("hadoop.home.dir", "C:\\Users\\Srdjan Nikitovic\\Desktop\\scala\\hadoop")
val spark = SparkSession
.builder()
.appName("Spark 1")
.config(conf)
.getOrCreate()
val tempS3Dir = "s3n://access_key:secret_access_key@bucket_location"
spark.sparkContext.hadoopConfiguration.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "access_key")
spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "secret_access_key")
val data =
spark
.read
.csv("hello.csv")
data.write
.format("com.databricks.spark.redshift")
.option("url", "jdbc:redshift://redshift_server:5439/database?user=user_name&password=password")
.option("dbtable", "public.testSpark")
.option("forward_spark_s3_credentials",true)
.option("tempdir", tempS3Dir)
.mode("error")
.save()
}
我正在通过Intellij从本地Windows计算机运行代码.
I am running the code from local Windows machine, thru Intellij.
我收到以下错误:
线程主"中的异常java.lang.ClassNotFoundException:无法加载Amazon Redshift JDBC驱动程序;有关下载和配置官方Amazon驱动程序的说明,请参阅自述文件.
Exception in thread "main" java.lang.ClassNotFoundException: Could not load an Amazon Redshift JDBC driver; see the README for instructions on downloading and configuring the official Amazon driver.
我已经尝试使用几乎所有版本的Spark-Redshift驱动程序(1.0.0、2.0.0、2.0.1和现在的3.0.0-PREVIEW),但是我无法使此代码正常工作.
I have tried with almost all the versions of Spark-Redshift drivers, (1.0.0, 2.0.0, 2.0.1 and now 3.0.0-PREVIEW) and I can't get this code to work.
有帮助吗?
推荐答案
您首先需要下载然后,您必须在运行此代码的环境中将其告知Spark.例如.在EMR上运行的spark-shell
:
Then you must tell Spark about it in the environment where this code is running. E.g. for a spark-shell
running on EMR:
spark-shell … --jars /usr/share/aws/redshift/jdbc/RedshiftJDBC41.jar
这篇关于使用Spark 2.0.1将数据写入Redshift的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!