如何添加第三方Java jar以便在pyspark中使用 [英] How to add third party java jars for use in pyspark

查看:778
本文介绍了如何添加第三方Java jar以便在pyspark中使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些使用Java的第三方数据库客户端库.我想通过

I have some third party Database client libraries in Java. I want to access them through

java_gateway.py

例如:通过Java网关使python客户端可以使用客户端类(而不是jdbc驱动程序!):

E.g: to make the client class (not a jdbc driver!) available to the python client via the java gateway:

java_import(gateway.jvm, "org.mydatabase.MyDBClient")

尚不清楚将第三方库添加到jvm类路径的位置.我尝试将其添加到compute-classpath.sh,但这似乎并没有奏效:我得到

It is not clear where to add the third party libraries to the jvm classpath. I tried to add to compute-classpath.sh but that did nto seem to work: I get

 Py4jError: Trying to call a package

此外,与Hive相比:Hive jar文件不会通过compute-classpath.sh加载,这使我感到怀疑.设置jvm端类路径似乎正在发生其他机制.

Also, when comparing to Hive: the hive jar files are NOT loaded via compute-classpath.sh so that makes me suspicious. There seems to be some other mechanism happening to set up the jvm side classpath.

推荐答案

您可以将外部jar作为参数添加到pyspark

You can add external jars as arguments to pyspark

pyspark --jars file1.jar,file2.jar

这篇关于如何添加第三方Java jar以便在pyspark中使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆