如何添加第三方Java JAR文件以在PySpark中使用 [英] How to add third-party Java JAR files for use in PySpark

查看:162
本文介绍了如何添加第三方Java JAR文件以在PySpark中使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些Java第三方数据库客户端库.我想通过

I have some third-party database client libraries in Java. I want to access them through

java_gateway.py

例如:通过Java网关使客户端类(不是JDBC驱动程序!)对Python客户端可用:

E.g.: to make the client class (not a JDBC driver!) available to the Python client via the Java gateway:

java_import(gateway.jvm, "org.mydatabase.MyDBClient")

尚不清楚将第三方库添加到JVM类路径的位置.我试图添加到文件 compute-classpath.sh ,但这似乎没有用.我得到:

It is not clear where to add the third-party libraries to the JVM classpath. I tried to add to file compute-classpath.sh, but that did not seem to work. I get:

Py4jError:尝试调用程序包

Py4jError: Trying to call a package

此外,与Hive相比:Hive JAR文件不是通过文件 compute-classpath.sh 加载的,因此让我感到怀疑.设置JVM端类路径似乎正在发生其他机制.

Also, when comparing to Hive: the hive JAR files are not loaded via file compute-classpath.sh, so that makes me suspicious. There seems to be some other mechanism happening to set up the JVM side classpath.

推荐答案

您可以将外部jar作为参数添加到pyspark

You can add external jars as arguments to pyspark

pyspark --jars file1.jar,file2.jar

这篇关于如何添加第三方Java JAR文件以在PySpark中使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆