Apache Spark:JDBC 连接不起作用 [英] Apache Spark : JDBC connection not working

查看:38
本文介绍了Apache Spark:JDBC 连接不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我之前也问过这个问题,但没有得到任何答案(无法在 pyspark shell 中使用 jdbc 连接到 postgres).

我已在本地 Windows 上成功安装 Spark 1.3.0 并运行示例程序以使用 pyspark shell 进行测试.

现在,我想对存储在 Postgresql 中的数据运行来自 Mllib 的关联,但我无法连接到 postgresql.

我已经通过运行

在类路径中成功添加了所需的 jar(测试了这个 jar)

pyspark --jars "C:\path\to\jar\postgresql-9.2-1002.jdbc3.jar"

我可以看到在环境 UI 中成功添加了 jar.

当我在 pyspark shell 中运行以下命令时-

from pyspark.sql import SQLContextsqlContext = SQLContext(sc)df = sqlContext.load(source="jdbc",url="jdbc:postgresql://[host]/[dbname]", dbtable="[schema.table]")

我收到此错误 -

<预><代码>>>>df = sqlContext.load(source="jdbc",url="jdbc:postgresql://[host]/[dbname]", dbtable="[schema.table]")回溯(最近一次调用最后一次):文件<stdin>",第 1 行,在 <module> 中文件C:\Users\ACERNEW3\Desktop\Spark\spark-1.3.0-bin-hadoop2.4\python\pyspark\sql\context.py",第482行,加载中df = self._ssql_ctx.load(source, joptions)文件C:\Users\ACERNEW3\Desktop\Spark\spark-1.3.0-bin-hadoop2.4\python\lib\py4j-0.8.2.1-src.zip\py4j\java_gateway.py",第 538 行,在__称呼__文件C:\Users\ACERNEW3\Desktop\Spark\spark-1.3.0-bin-hadoop2.4\python\lib\py4j-0.8.2.1-src.zip\py4j\protocol.py",第300行,在获取返回值py4j.protocol.Py4JJavaError:调用 o20.load 时发生错误.: java.sql.SQLException: 找不到适合 jdbc:postgresql://[host]/[dbname] 的驱动程序在 java.sql.DriverManager.getConnection(DriverManager.java:602)在 java.sql.DriverManager.getConnection(DriverManager.java:207)在 org.apache.spark.sql.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:94)在 org.apache.spark.sql.jdbc.JDBCRelation.(JDBCRelation.scala:125)在 org.apache.spark.sql.jdbc.DefaultSource.createRelation(JDBCRelation.scala:114)在 org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:290)在 org.apache.spark.sql.SQLContext.load(SQLContext.scala:679)在 org.apache.spark.sql.SQLContext.load(SQLContext.scala:667)在 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)在 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)在 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)在 java.lang.reflect.Method.invoke(Method.java:597)在 py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)在 py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)在 py4j.Gateway.invoke(Gateway.java:259)在 py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)在 py4j.commands.CallCommand.execute(CallCommand.java:79)在 py4j.GatewayConnection.run(GatewayConnection.java:207)在 java.lang.Thread.run(Thread.java:619)

解决方案

我在使用 mysql/mariadb 时遇到了这个问题,并从 这个问题

所以你的 pyspark 命令应该是:

pyspark --conf spark.executor.extraClassPath=--driver-class-path --jars --master 

还要注意 pyspark 启动时的错误,例如警告:本地 jar ... 不存在,正在跳过."和ERROR SparkContext: Jar not found at ...",这可能意味着你拼错了路径.

I have asked this question previously also but did not got any answer (Not able to connect to postgres using jdbc in pyspark shell).

I have successfully installed Spark 1.3.0 on my local windows and ran sample programs to test using pyspark shell.

Now, I want to run Correlations from Mllib on the data that is stored in Postgresql, but I am not able to connect to postgresql.

I have successfully added the required jar (tested this jar) in the classpath by running

pyspark --jars "C:\path\to\jar\postgresql-9.2-1002.jdbc3.jar"

I can see that jar is successfully added in environment UI.

When I run the following in pyspark shell-

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
df = sqlContext.load(source="jdbc",url="jdbc:postgresql://[host]/[dbname]", dbtable="[schema.table]")  

I get this ERROR -

>>> df = sqlContext.load(source="jdbc",url="jdbc:postgresql://[host]/[dbname]", dbtable="[schema.table]")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\ACERNEW3\Desktop\Spark\spark-1.3.0-bin-hadoop2.4\python\pyspark\sql\context.py", line 482, in load
    df = self._ssql_ctx.load(source, joptions)
  File "C:\Users\ACERNEW3\Desktop\Spark\spark-1.3.0-bin-hadoop2.4\python\lib\py4j-0.8.2.1-src.zip\py4j\java_gateway.py", line 538, in __call__
  File "C:\Users\ACERNEW3\Desktop\Spark\spark-1.3.0-bin-hadoop2.4\python\lib\py4j-0.8.2.1-src.zip\py4j\protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o20.load.
: java.sql.SQLException: No suitable driver found for     jdbc:postgresql://[host]/[dbname]
        at java.sql.DriverManager.getConnection(DriverManager.java:602)
        at java.sql.DriverManager.getConnection(DriverManager.java:207)
        at org.apache.spark.sql.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:94)
        at org.apache.spark.sql.jdbc.JDBCRelation.<init>    (JDBCRelation.scala:125)
        at  org.apache.spark.sql.jdbc.DefaultSource.createRelation(JDBCRelation.scala:114)
        at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:290)
        at org.apache.spark.sql.SQLContext.load(SQLContext.scala:679)
        at org.apache.spark.sql.SQLContext.load(SQLContext.scala:667)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
        at py4j.Gateway.invoke(Gateway.java:259)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:207)
        at java.lang.Thread.run(Thread.java:619)

解决方案

I had this exact problem with mysql/mariadb, and got BIG clue from this question

So your pyspark command should be:

pyspark --conf spark.executor.extraClassPath=<jdbc.jar> --driver-class-path <jdbc.jar> --jars <jdbc.jar> --master <master-URL>

Also watch for errors when pyspark start like "Warning: Local jar ... does not exist, skipping." and "ERROR SparkContext: Jar not found at ...", these probably mean you spelled the path wrong.

这篇关于Apache Spark:JDBC 连接不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆