使用笔记本时,将jar添加到pyspark [英] Add jar to pyspark when using notebook
问题描述
我正在尝试将mongodb hadoop与spark集成,但无法弄清楚如何让IPy可以访问IPython笔记本。
I'm trying the mongodb hadoop integration with spark but can't figure out how to make the jars accessible to an IPython notebook.
这就是我的意思试图这样做:
Here what I'm trying to do:
# set up parameters for reading from MongoDB via Hadoop input format
config = {"mongo.input.uri": "mongodb://localhost:27017/db.collection"}
inputFormatClassName = "com.mongodb.hadoop.MongoInputFormat"
# these values worked but others might as well
keyClassName = "org.apache.hadoop.io.Text"
valueClassName = "org.apache.hadoop.io.MapWritable"
# Do some reading from mongo
items = sc.newAPIHadoopRDD(inputFormatClassName, keyClassName, valueClassName, None, None, config)
当我使用以下命令在pyspark中启动它时,此代码正常工作:
This code works fine when I launch it in pyspark using the following command:
spark-1.4.1 / bin / pyspark --jars' mongo-hadoop-core-1.4.0.jar,mongo-java-driver-3.0.2.jar'
spark-1.4.1/bin/pyspark --jars 'mongo-hadoop-core-1.4.0.jar,mongo-java-driver-3.0.2.jar'
其中 mongo-hadoop-core-1.4.0.jar
和 mongo-java-driver-2.10.1。 jar
允许从java使用mongodb。但是,当我这样做时:
where mongo-hadoop-core-1.4.0.jar
and mongo-java-driver-2.10.1.jar
allows using mongodb from java. However, when I do this:
IPYTHON_OPTS =notebookspark-1.4.1 / bin / pyspark --jars'mongo-hadoop -core-1.4.0.jar,mongo-java-driver-3.0.2.jar'
IPYTHON_OPTS="notebook" spark-1.4.1/bin/pyspark --jars 'mongo-hadoop-core-1.4.0.jar,mongo-java-driver-3.0.2.jar'
这些罐子不再可用了我收到以下错误:
The jars are not available anymore and I get the following error:
java.lang.ClassNotFoundException:com.mongodb.hadoop.MongoInputFormat
java.lang.ClassNotFoundException: com.mongodb.hadoop.MongoInputFormat
有谁知道如何让IPy可用于IPython笔记本中的火花?我很确定这不是特定于mongo所以也许有人已经成功地在使用笔记本时将jar添加到类路径中了?
Does anyone know how to make jars available to the spark in the IPython notebook? I'm pretty sure this is not specific to mongo so maybe someone already has succeeded in adding jars to the classpath while using the notebook?
推荐答案
非常相似,请告诉我这是否有帮助:
https:/ /issues.apache.org/jira/browse/SPARK-5185
Very similar, please let me know if this helps: https://issues.apache.org/jira/browse/SPARK-5185
这篇关于使用笔记本时,将jar添加到pyspark的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!