如何使用JDBC将(Py)Spark连接到Postgres数据库 [英] How to connect (Py)Spark to Postgres database using JDBC
问题描述
我已按照中的说明进行操作此发布可从现有的Postgres数据库中读取数据,该数据库具有由SQLalchemy中的Objects
类定义和创建的名为对象"的表.在我的Jupyter笔记本中,我的代码是
I have followed instructions from this posting to read data from an existing Postgres database with table named "objects" as defined and created by the Objects
class in SQLalchemy. In my Jupyter notebook, my code is
from pyspark import SparkContext
from pyspark import SparkConf
from random import random
#spark conf
conf = SparkConf()
conf.setMaster("local[*]")
conf.setAppName('pyspark')
sc = SparkContext(conf=conf)
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
properties = {
"driver": "org.postgresql.Driver"
}
url = 'jdbc:postgresql://PG_USER:PASSWORD@PG_SERVER_IP/db_name'
df = sqlContext.read.jdbc(url=url, table='objects', properties=properties)
最后一行显示以下内容:
the last line results in the following:
Py4JJavaError: An error occurred while calling o25.jdbc.
: java.lang.NullPointerException
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:158)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:117)
at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:237)
at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:159)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:211)
at java.lang.Thread.run(Thread.java:745)
,因此它似乎无法解析该表.如何从此处进行测试以确保我已正确连接到数据库?
so it looks like it can't resolve the table. How do I test from here to make sure that I am connected to the database properly?
推荐答案
名称解析问题由org.postgresql.util.PSQLException
指示,并且不会导致NPE.问题的根源实际上是连接字符串,尤其是您提供用户凭据的方式.乍看起来,它看起来像是个错误,但是如果您正在寻找一种快速的解决方案,则可以使用URL属性:
Problems with name resolving are indicated by org.postgresql.util.PSQLException
and don't result in NPE. The source of the issue is actually a connection string and in particular the way you provide user credentials. At first glance it looks like a bug but if you're looking for a quick solution you can either use URL properties:
url = 'jdbc:postgresql://PG_SERVER_IP/db_name?user=PG_USER&password=PASSWORD'
或属性参数:
properties = {
"user": "PG_USER",
"password": "PASSWORD",
"driver": "org.postgresql.Driver"
}
这篇关于如何使用JDBC将(Py)Spark连接到Postgres数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!