PySpark sqlContext 读取 Postgres 9.6 NullPointerException [英] PySpark sqlContext read Postgres 9.6 NullPointerException
问题描述
尝试使用 PySpark 从 Postgres 数据库读取表.我已经设置了以下代码并验证了 SparkContext 存在:
Trying to read a table with PySpark from a Postgres DB. I have set up the following code and verified SparkContext exists:
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--driver-class-path /tmp/jars/postgresql-42.0.0.jar --jars /tmp/jars/postgresql-42.0.0.jar pyspark-shell'
from pyspark import SparkContext, SparkConf
conf = SparkConf()
conf.setMaster("local[*]")
conf.setAppName('pyspark')
sc = SparkContext(conf=conf)
from pyspark.sql import SQLContext
properties = {
"driver": "org.postgresql.Driver"
}
url = 'jdbc:postgresql://tom:@localhost/gqp'
sqlContext = SQLContext(sc)
sqlContext.read \
.format("jdbc") \
.option("url", url) \
.option("driver", properties["driver"]) \
.option("dbtable", "specimen") \
.load()
我收到以下错误:
Py4JJavaError: An error occurred while calling o812.load. : java.lang.NullPointerException
我的数据库的名称是 gqp
,表是 specimen
,并且已经使用 Postgres.app macOS 验证它在 localhost
上运行应用.
The name of my database is gqp
, table is specimen
, and have verified it is running on localhost
using the Postgres.app macOS app.
推荐答案
URL 是问题所在!
原来是:url = 'jdbc:postgresql://tom:@localhost/gqp'
我删除了 tom:@
部分,它起作用了.URL 必须遵循以下模式:jdbc:postgresql://ip_address:port/db_name
,而我的是直接从 Flask 项目复制的.
I removed the tom:@
part, and it worked. The URL must follow the pattern: jdbc:postgresql://ip_address:port/db_name
, whereas mine was directly copied from a Flask project.
如果您正在阅读本文,希望您不要犯同样的错误:)
If you're reading this, hope you didn't make this same mistake :)
这篇关于PySpark sqlContext 读取 Postgres 9.6 NullPointerException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!