如何使用JDBC将(P​​y)Spark连接到Postgres数据库 [英] How to connect (Py)Spark to Postgres database using JDBC

查看:218
本文介绍了如何使用JDBC将(P​​y)Spark连接到Postgres数据库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已按照中的说明进行操作此发布可从现有的Postgres数据库中读取数据,该数据库具有由SQLalchemy中的Objects类定义和创建的名为对象"的表.在我的Jupyter笔记本中,我的代码是

I have followed instructions from this posting to read data from an existing Postgres database with table named "objects" as defined and created by the Objects class in SQLalchemy. In my Jupyter notebook, my code is

from pyspark import SparkContext
from pyspark import SparkConf
from random import random

#spark conf
conf = SparkConf()
conf.setMaster("local[*]")
conf.setAppName('pyspark')

sc = SparkContext(conf=conf)

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
properties = {
    "driver": "org.postgresql.Driver"
}
url = 'jdbc:postgresql://PG_USER:PASSWORD@PG_SERVER_IP/db_name'
df = sqlContext.read.jdbc(url=url, table='objects', properties=properties)

最后一行显示以下内容:

the last line results in the following:

Py4JJavaError: An error occurred while calling o25.jdbc.
: java.lang.NullPointerException
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:158)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:117)
    at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:237)
    at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:159)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:280)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:211)
    at java.lang.Thread.run(Thread.java:745)

,因此它似乎无法解析该表.如何从此处进行测试以确保我已正确连接到数据库?

so it looks like it can't resolve the table. How do I test from here to make sure that I am connected to the database properly?

推荐答案

名称解析问题由org.postgresql.util.PSQLException指示,并且不会导致NPE.问题的根源实际上是连接字符串,尤其是您提供用户凭据的方式.乍看起来,它看起来像是个错误,但是如果您正在寻找一种快速的解决方案,则可以使用URL属性:

Problems with name resolving are indicated by org.postgresql.util.PSQLException and don't result in NPE. The source of the issue is actually a connection string and in particular the way you provide user credentials. At first glance it looks like a bug but if you're looking for a quick solution you can either use URL properties:

url = 'jdbc:postgresql://PG_SERVER_IP/db_name?user=PG_USER&password=PASSWORD'

或属性参数:

properties = {
    "user": "PG_USER",
    "password": "PASSWORD",
    "driver": "org.postgresql.Driver"
}

这篇关于如何使用JDBC将(P​​y)Spark连接到Postgres数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆