MongoDB Spark连接器py4j.protocol.Py4JJavaError:调用o50.load时发生错误 [英] MongoDB Spark Connector py4j.protocol.Py4JJavaError: An error occurred while calling o50.load

查看:404
本文介绍了MongoDB Spark连接器py4j.protocol.Py4JJavaError:调用o50.load时发生错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我以前能够加载此MongoDB数据库,但现在收到一个我无法弄清的错误.

I have been able to load this MongoDB database before, but am now receiving an error I haven't been able to figure out.

这是我开始Spark会话的方式:

Here is how I start my Spark session:

spark = SparkSession.builder \
        .master("local[*]") \
        .appName("collab_rec") \
        .config("spark.mongodb.input.uri", "mongodb://127.0.0.1/example.collection") \
        .config("spark.mongodb.output.uri", "mongodb://127.0.0.1/example.collection") \
        .getOrCreate()

我运行此脚本,以便可以通过ipython与spark进行交互,从而加载mongo spark连接器程序包:

I run this script so that I can interact with spark through ipython wich loads the mongo spark connector package:

#!/bin/bash
export PYSPARK_DRIVER_PYTHON=ipython

${SPARK_HOME}/bin/pyspark \
--master local[4] \
--executor-memory 1G \
--driver-memory 1G \
--conf spark.sql.warehouse.dir="file:///tmp/spark-warehouse" \
--packages com.databricks:spark-csv_2.11:1.5.0 \
--packages com.amazonaws:aws-java-sdk-pom:1.10.34 \
--packages org.apache.hadoop:hadoop-aws:2.7.3 \
--packages org.mongodb.spark:mongo-spark-connector_2.11:2.0.0\

Spark可以很好地加载,并且看来软件包也可以正确加载.

Spark loads fine and it appears the package is loading correctly as well.

这是我尝试将数据库加载到数据帧中的方式:

Here is how I attempt to load that database into a dataframe:

df = spark.read.format("com.mongodb.spark.sql.DefaultSource").load()

但是,在那一行上,我收到以下错误消息:

However, on that line, I receive the following error:

Py4JJavaError: An error occurred while calling o46.load.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.analysis.TypeCoercion$.findTightestCommonTypeOfTwo()Lscala/Function2;
    at com.mongodb.spark.sql.MongoInferSchema$.com$mongodb$spark$sql$MongoInferSchema$$compatibleType(MongoInferSchema.scala:132)
    at com.mongodb.spark.sql.MongoInferSchema$$anonfun$3.apply(MongoInferSchema.scala:76)
    at com.mongodb.spark.sql.MongoInferSchema$$anonfun$3.apply(MongoInferSchema.scala:76)

从下面的文档/教程中我可以看到,我正在尝试正确加载数据框:

From what I can see through the following documentation/tutorial I am attempting to load the dataframe correctly:

https://docs.mongodb.com/spark-connector/master/python-api/

我正在使用Spark 2.2.0 请注意,我已经能够通过AWS在Mac和Linux上复制此错误.

I am using Spark 2.2.0 Note that I have been able to replicate this error on both my mac and linux through AWS.

推荐答案

我想出了我的问题的答案.这是Mongo-Spark连接器和我升级到的Spark版本的兼容性问题.具体来说,在PR中将findTightestCommonTypeOfTwo值重命名:

I figured out the answer to my question. This was a compatibility issue with the Mongo-Spark connector and the version of Spark that I upgraded to. Specifically, the findTightestCommonTypeOfTwo value was renamed in the PR:

https://github.com/apache/spark/pull/16786/files

对于Spark 2.2.0,兼容的Mongo-Spark连接器也是2.2.0,因此在我的示例中,程序包的加载方式如下:

For Spark 2.2.0 the compatible Mongo-Spark connector is also 2.2.0, thus in my example, the package would be loaded like this:

--packages org.mongodb.spark:mongo-spark-connector_2.11:2.2.0\

将来这可能会更改,因此在使用连接器时,应检查与所使用的Spark版本的兼容性.

This could change in the future so when using the connector, you should check for compatibility with the version of Spark being used.

这篇关于MongoDB Spark连接器py4j.protocol.Py4JJavaError:调用o50.load时发生错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆