无法连接到pyspark外壳使用JDBC的Postgres [英] Not able to connect to postgres using jdbc in pyspark shell

查看:952
本文介绍了无法连接到pyspark外壳使用JDBC的Postgres的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用我的本地Windows独立群集,并试图加载使用以下code从我们的服务器的一个数据 -

I am using standalone cluster on my local windows and trying to load data from one of our server using following code -

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
df = sqlContext.load(source="jdbc", url="jdbc:postgresql://host/dbname", dbtable="schema.tablename")

我已经设置了SPARK_CLASSPATH为 -

I have set the SPARK_CLASSPATH as -

os.environ['SPARK_CLASSPATH'] = "C:\Users\ACERNEW3\Desktop\Spark\spark-1.3.0-bin-hadoop2.4\postgresql-9.2-1002.jdbc3.jar"

在执行sqlContext.load,它会抛出错误提找到JDBC没有合适的驱动程序:PostgreSQL的。我试图寻找网页,但没能找到解决办法。

While executing sqlContext.load, it throws error mentioning "No suitable driver found for jdbc:postgresql". I have tried searching web, but not able to find solution.

推荐答案

我与MySQL同样的问题,而且是从来没有能够得到它与SPARK_CLASSPATH方式工作。不过,我没有得到它与额外的命令行参数的工作,一看便知,以<一个href=\"http://stackoverflow.com/questions/29821518/apache-spark-jdbc-connection-not-working/30947090#30947090\">this问题

I had the same problem with mysql, and was never able to get it to work with the SPARK_CLASSPATH approach. However I did get it to work with extra command line arguments, see the answer to this question

要避免通过点击得到它的工作,这里是你必须做的:

To avoid having to click through to get it working, here's what you have to do:

pyspark --conf spark.executor.extraClassPath=<jdbc.jar> --driver-class-path <jdbc.jar> --jars <jdbc.jar> --master <master-URL>

这篇关于无法连接到pyspark外壳使用JDBC的Postgres的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆