如何连接到 Amazon Redshift 或 Apache Spark 中的其他数据库? [英] How to connect to Amazon Redshift or other DB's in Apache Spark?

查看：24 发布时间：2021/11/27 10:36:02 amazon-web-services amazon-s3 apache-spark amazon-redshift

本文介绍了如何连接到 Amazon Redshift 或 Apache Spark 中的其他数据库?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试通过 Spark 连接到 Amazon Redshift，因此我可以将 S3 上的数据与 RS 集群上的数据连接起来.我在这里找到了一些非常简洁的文档，用于连接到 JDBC 的能力:

I'm trying to connect to Amazon Redshift via Spark, so I can join data we have on S3 with data on our RS cluster. I found some very spartan documentation here for the capability of connecting to JDBC:

https://spark.apache.org/docs/1.3.1/sql-programming-guide.html#jdbc-to-other-databases

加载命令看起来相当简单(虽然我不知道如何在此处输入 AWS 凭证，也许在选项中?).

The load command seems fairly straightforward (although I don't know how I would enter AWS credentials here, maybe in the options?).

df = sqlContext.load(source="jdbc", url="jdbc:postgresql:dbserver", dbtable="schema.tablename")

而且我不完全确定如何处理 SPARK_CLASSPATH 变量.我现在通过 iPython notebook(作为 Spark 发行版的一部分)在本地运行 Spark.我在哪里定义它以便 Spark 加载它?

And I'm not entirely sure how to deal with the SPARK_CLASSPATH variable. I'm running Spark locally for now through an iPython notebook (as part of the Spark distribution). Where do I define that so that Spark loads it?

无论如何，就目前而言，当我尝试运行这些命令时，我收到了一堆无法辨认的错误，所以我现在有点卡住了.感谢任何帮助或指向详细教程的指针.

Anyway, for now, when I try running these commands, I get a bunch of undecipherable errors, so I'm kind of stuck for now. Any help or pointers to detailed tutorials are appreciated.

推荐答案

虽然这似乎是一个很老的帖子，但任何仍在寻找答案的人，以下步骤对我有用！

Although this seems to be a very old post, anyone who is still looking for answer, below steps worked for me!

启动包含 jar 的 shell.

Start the shell including the jar.

bin/pyspark --driver-class-path /path_to_postgresql-42.1.4.jar --jars /path_to_postgresql-42.1.4.jar

通过提供适当的详细信息来创建 df:

Create a df by giving appropriate details:

myDF = spark.read 
    .format("jdbc") 
    .option("url", "jdbc:redshift://host:port/db_name") 
    .option("dbtable", "table_name") 
    .option("user", "user_name") 
    .option("password", "password") 
    .load()

Spark 版本:2.2

这篇关于如何连接到 Amazon Redshift 或 Apache Spark 中的其他数据库?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何连接到 Amazon Redshift 或 Apache Spark 中的其他数据库? [英] How to connect to Amazon Redshift or other DB's in Apache Spark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何连接到 Amazon Redshift 或 Apache Spark 中的其他数据库? [英] How to connect to Amazon Redshift or other DB&#39;s in Apache Spark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

如何连接到 Amazon Redshift 或 Apache Spark 中的其他数据库? [英] How to connect to Amazon Redshift or other DB's in Apache Spark?

登录关闭