pyspark:方法isBarrier([])不存在 [英] pyspark: Method isBarrier([]) does not exist

查看:81
本文介绍了pyspark:方法isBarrier([])不存在的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用pyspark下面的一些问候词级别的示例来学习Spark.我收到方法isBarrier([])不存在"错误,代码下面包含了完整错误.

I'm trying to learn Spark following some hello-word level example such as below, using pyspark. I got a "Method isBarrier([]) does not exist" error, full error included below the code.

from pyspark import SparkContext

if __name__ == '__main__':
    sc = SparkContext('local[6]', 'pySpark_pyCharm')
    rdd = sc.parallelize([1, 2, 3, 4, 5, 6, 7, 8])
    rdd.collect()
    rdd.count()

尽管如此,当我直接在命令行中启动pyspark会话并键入相同的代码时,它仍然可以正常工作:

Although, when I start a pyspark session in command line directly and type in the same code, it works fine:

我的设置:

  • Windows 10 Pro x64
  • python 3.7.2
  • spark 2.3.3 hadoop 2.7
  • pyspark 2.4.0

推荐答案

问题是Spark JVM库的版本与PySpark之间不兼容.通常,PySpark版本必须与您的Spark安装版本完全匹配(理论上,匹配主要版本和次要版本就足够了,维护版本中的某些不兼容性

The problem is incompatibility between versions of Spark JVM libraries and PySpark. In general PySpark version has to exactly match the version of your Spark installation (while in theory matching major and minor versions should be enough, some incompatibilities in maintenance releases have been introduced in the past).

换句话说,Spark 2.3.3与PySpark 2.4.0不兼容,您必须将Spark升级到2.4.0或将PySpark降级到2.3.3.

In other words Spark 2.3.3 is not compatible with PySpark 2.4.0 and you have to either upgrade Spark to 2.4.0 or downgrade PySpark to 2.3.3.

总体PySpark并非旨在用作独立库.虽然 PyPi软件包是一种方便的开发工具(通常,仅安装软件包比手动扩展软件包要容易得多) PYTHONPATH),对于实际部署,最好坚持与实际Spark部署捆绑在一起的PySpark软件包.

Overall PySpark is not designed to be used a standalone library. While PyPi package is a handy development tool (it is often easier to just install a package than manually extend the PYTHONPATH), for actual deployments it is better to stick with the PySpark package bundled with actual Spark deployment.

这篇关于pyspark:方法isBarrier([])不存在的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆