使用 Spark 2.0.1 将数据写入 Redshift [英] Write data to Redshift using Spark 2.0.1

查看:26
本文介绍了使用 Spark 2.0.1 将数据写入 Redshift的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在做一个 POC,我想向 Redshift 写入一些简单的数据集.

I am doing a POC, where I want to write some simple data set to Redshift.

我有以下 sbt 文件:

I have following sbt file:

name := "Spark_POC"

version := "1.0"

scalaVersion := "2.10.6"


libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "2.0.1"

libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "2.0.1"

resolvers += "jitpack" at "https://jitpack.io"

libraryDependencies += "com.databricks" %% "spark-redshift" % "3.0.0-preview1"

和以下代码:

object Main extends App{

  val conf = new SparkConf().setAppName("Hello World").setMaster("local[2]")

  System.setProperty("hadoop.home.dir", "C:\\Users\\Srdjan Nikitovic\\Desktop\\scala\\hadoop")

  val spark = SparkSession
    .builder()
    .appName("Spark 1")
    .config(conf)
    .getOrCreate()


  val tempS3Dir = "s3n://access_key:secret_access_key@bucket_location"

  spark.sparkContext.hadoopConfiguration.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
  spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "access_key")
  spark.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "secret_access_key")

  val data =
    spark
      .read
      .csv("hello.csv")

  data.write
    .format("com.databricks.spark.redshift")
    .option("url", "jdbc:redshift://redshift_server:5439/database?user=user_name&password=password")
    .option("dbtable", "public.testSpark")
    .option("forward_spark_s3_credentials",true)
    .option("tempdir", tempS3Dir)
    .mode("error")
    .save()
}

我正在通过 Intellij 从本地 Windows 机器运行代码.

I am running the code from local Windows machine, thru Intellij.

我收到以下错误:

线程main"中的异常 java.lang.ClassNotFoundException:无法加载 Amazon Redshift JDBC 驱动程序;有关下载和配置亚马逊官方驱动程序的说明,请参阅自述文件.

Exception in thread "main" java.lang.ClassNotFoundException: Could not load an Amazon Redshift JDBC driver; see the README for instructions on downloading and configuring the official Amazon driver.

我尝试过几乎所有版本的 Spark-Redshift 驱动程序(1.0.0、2.0.0、2.0.1 和现在的 3.0.0-PREVIEW),但我无法让这段代码工作.

I have tried with almost all the versions of Spark-Redshift drivers, (1.0.0, 2.0.0, 2.0.1 and now 3.0.0-PREVIEW) and I can't get this code to work.

有什么帮助吗?

推荐答案

您首先需要下载 来自亚马逊的 Redshift JDBC 驱动程序.

然后您必须在运行此代码的环境中将它告诉 Spark.例如.对于在 EMR 上运行的 spark-shell:

Then you must tell Spark about it in the environment where this code is running. E.g. for a spark-shell running on EMR:

spark-shell … --jars /usr/share/aws/redshift/jdbc/RedshiftJDBC41.jar

这篇关于使用 Spark 2.0.1 将数据写入 Redshift的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆