Pyspark 连接到 ipython notebook 中的 Postgres 数据库 [英] Pyspark connection to Postgres database in ipython notebook

查看:34
本文介绍了Pyspark 连接到 ipython notebook 中的 Postgres 数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经阅读了以前的帖子,但我仍然无法确定为什么我无法将我的 ipython 笔记本连接到 Postgres 数据库.

I've read previous posts on this, but I still cannot pinpoint why I am unable to connect my ipython notebook to a Postgres db.

我可以在 ipython 笔记本中启动 pyspark,SparkContext 被加载为sc".

I am able to launch pyspark in an ipython notebook, SparkContext is loaded as 'sc'.

我的 .bash_profile 中有以下内容用于查找 Postgres 驱动程序:

I have the following in my .bash_profile for finding the Postgres driver:

export SPARK_CLASSPATH=/path/to/downloaded/jar

这是我在 ipython notebook 中为连接数据库所做的事情(基于 this 帖子):

Here's what I am doing in the ipython notebook to connect to the db (based on this post):

from pyspark.sql import DataFrameReader as dfr
sqlContext = SQLContext(sc)

table= 'some query'
url = 'postgresql://localhost:5432/dbname'
properties = {'user': 'username', 'password': 'password'}

df = dfr(sqlContext).jdbc(
url='jdbc:%s' % url, table=table, properties=properties
)

错误:

Py4JJavaError: An error occurred while calling o156.jdbc.
: java.SQL.SQLException: No suitable driver.

我知道查找我下载的驱动程序时出错,但我不明白为什么在我的 .bash_profile 中添加路径时会收到此错误.

I understand it's an error with finding the driver I've downloaded, but I don't understand why I am getting this error when I've added the path to it in my .bash_profile.

我也尝试通过 pyspark --jars 设置驱动程序,但出现没有此类文件或目录"错误.

I also tried to set driver via pyspark --jars, but I get a "no such file or directory" error.

这个博客文章 还显示了如何连接到 Postgres 数据源,但以下内容也给了我一个没有这样的目录"错误:

This blogpost also shows how to connect to Postgres data sources, but the following also gives me a "no such directory" error:

 ./bin/spark-shell --packages org.postgresql:postgresql:42.1.4

附加信息:

spark version: 2.2.0
python version: 3.6
java: 1.8.0_25
postgres driver: 42.1.4

推荐答案

我按照 这篇 帖子.SparkContext 已经为我设置为 sc,所以我所要做的就是从我的 .bash_profile 中删除 SPARK_CLASSPATH 设置,并在我的 ipython 笔记本中使用以下内容:

I followed directions in this post. SparkContext is already set as sc for me, so all I had to do was remove the SPARK_CLASSPATH setting from my .bash_profile, and use the following in my ipython notebook:

import os

os.environ['PYSPARK_SUBMIT_ARGS'] = '--driver-class-path /path/to/postgresql-42.1.4.jar --jars /path/to/postgresql-42.1.4.jar pyspark-shell'

我还在属性中添加了一个驱动程序"设置,并且奏效了.正如本文其他地方所述,这可能是因为 SPARK_CLASSPATH 已被弃用,最好使用 --driver-class-path.

I added a 'driver' settings to properties as well, and it worked. As stated elsewhere in this post, this is likely because SPARK_CLASSPATH is deprecated, and it is preferable to use --driver-class-path.

这篇关于Pyspark 连接到 ipython notebook 中的 Postgres 数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆