如何连接到Amazon红移或其他数据库在Apache的火花? [英] How to connect to Amazon Redshift or other DB's in Apache Spark?

查看:233
本文介绍了如何连接到Amazon红移或其他数据库在Apache的火花?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想通过星火连接到亚马逊的红移,这样我就可以加入我们有S3与我们的RS集群上的数据。我发现了一些很简陋的文档,这里连接到JDBC的能力:

I'm trying to connect to Amazon Redshift via Spark, so I can join data we have on S3 with data on our RS cluster. I found some very spartan documentation here for the capability of connecting to JDBC:

https://spark.apache.org/docs/1.3.1/sql-programming-guide.html#jdbc-to-other-databases

load命令似乎相当简单(虽然我不知道我怎么会在这里输入AWS凭据,也许在选择?)。

The load command seems fairly straightforward (although I don't know how I would enter AWS credentials here, maybe in the options?).

df = sqlContext.load(source="jdbc", url="jdbc:postgresql:dbserver", dbtable="schema.tablename")

和我不完全知道如何应对SPARK_CLASSPATH变量。我通过IPython的笔记本电脑在本地运行,现在火花(如星火分布的一部分)。我在哪里定义,使星火负荷呢?

And I'm not entirely sure how to deal with the SPARK_CLASSPATH variable. I'm running Spark locally for now through an iPython notebook (as part of the Spark distribution). Where do I define that so that Spark loads it?

不管怎样,现在,当我尝试运行这些命令,我​​得到了一堆不可译的错误,所以我有点卡住了。任何帮助或指针详细的教程是AP preciated。

Anyway, for now, when I try running these commands, I get a bunch of undecipherable errors, so I'm kind of stuck for now. Any help or pointers to detailed tutorials are appreciated.

推荐答案

如果您在使用星火1.4.0或更高版本,检查出的 火花红移 ,它支持从红移将数据加载到SQL星火和DataFrames节省DataFrames回红移库。如果要查询大量数据,这种方法应该比JDBC有更好的表现,因为这将可以卸载和并行查询数据。

If you're using Spark 1.4.0 or newer, check out spark-redshift, a library which supports loading data from Redshift into Spark SQL DataFrames and saving DataFrames back to Redshift. If you're querying large volumes of data, this approach should perform better than JDBC because it will be able to unload and query the data in parallel.

如果你仍然想使用JDBC,检查了新的内置JDBC数据源在星火1.4 +

If you still want to use JDBC, check out the new built-in JDBC data source in Spark 1.4+.

披露:我是火花红移

这篇关于如何连接到Amazon红移或其他数据库在Apache的火花?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆