Spark安装问题-TypeError:必需为整数(获取类型字节)-spark-2.4.5-bin-hadoop2.7,hadoop 2.7.1,python 3.8.2 [英] Spark Installation Problems -TypeError: an integer is required (got type bytes) - spark-2.4.5-bin-hadoop2.7, hadoop 2.7.1, python 3.8.2

查看:118
本文介绍了Spark安装问题-TypeError:必需为整数(获取类型字节)-spark-2.4.5-bin-hadoop2.7,hadoop 2.7.1,python 3.8.2的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在64位Windows OS计算机上安装Spark.我安装了python 3.8.2.我有版本20.0.2的小知识.我下载了spark-2.4.5-bin-hadoop2.7,并将环境变量设置为HADOOP_HOME,SPARK_HOME,并将pyspark添加到路径变量.当我从cmd运行pyspark时,看到以下错误:

I'm trying to install Spark on my 64 -bit Windows OS computer. I installed python 3.8.2. I have pip with version 20.0.2. I download spark-2.4.5-bin-hadoop2.7 and set environment variables as HADOOP_HOME, SPARK_HOME and I add pyspark to path variable. When I run pyspark from cmd I see the error given below:

C:\Users\aa>pyspark
Python 3.8.2 (tags/v3.8.2:7b3ab59, Feb 25 2020, 23:03:10) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
  File "C:\Users\aa\Downloads\spark-2.4.5-bin-hadoop2.7\spark-2.4.5-bin-hadoop2.7\python\pyspark\shell.py", line 31, in <module>
    from pyspark import SparkConf
  File "C:\Users\aa\Downloads\spark-2.4.5-bin-hadoop2.7\spark-2.4.5-bin-hadoop2.7\python\pyspark\__init__.py", line 51, in <module>
    from pyspark.context import SparkContext
  File "C:\Users\aa\Downloads\spark-2.4.5-bin-hadoop2.7\spark-2.4.5-bin-hadoop2.7\python\pyspark\context.py", line 31, in <module>
    from pyspark import accumulators
  File "C:\Users\aa\Downloads\spark-2.4.5-bin-hadoop2.7\spark-2.4.5-bin-hadoop2.7\python\pyspark\accumulators.py", line 97, in <module>
    from pyspark.serializers import read_int, PickleSerializer
  File "C:\Users\aa\Downloads\spark-2.4.5-bin-hadoop2.7\spark-2.4.5-bin-hadoop2.7\python\pyspark\serializers.py", line 72, in <module>
    from pyspark import cloudpickle
  File "C:\Users\aa\Downloads\spark-2.4.5-bin-hadoop2.7\spark-2.4.5-bin-hadoop2.7\python\pyspark\cloudpickle.py", line 145, in <module>
    _cell_set_template_code = _make_cell_set_template_code()
  File "C:\Users\aa\Downloads\spark-2.4.5-bin-hadoop2.7\spark-2.4.5-bin-hadoop2.7\python\pyspark\cloudpickle.py", line 126, in _make_cell_set_template_code
    return types.CodeType(
TypeError: an integer is required (got type bytes)

我想将pyspark导入到我的python代码中,但是在Pycharm中,但是在运行代码文件后,出现类似 TypeError的错误:还需要一个整数(获取类型字节).我卸载了python 3.8.2,并尝试使用python 2.7,但是在这种情况下,我发生了折旧错误.我收到以下错误并更新了pip安装程序.

I want to import pyspark to my python code but in Pycharm but after I run my code file I take an error like TypeError: an integer is required (got type bytes) also. I uninstall python 3.8.2 and tried with python 2.7 but in this case I take an depreciation error. I take the error given below and update pip installer.

Could not find a version that satisfies the requirement pyspark (from versions: )
No matching distribution found for pyspark 

然后我运行 python -m pip install --upgrade pip 更新pip,但是我遇到了 TypeError:再次需要输入整数(错误类型为字节)的问题./p>

Then I run python -m pip install --upgrade pip to update pip but I have TypeError: an integer is required (got type bytes) problem again.

C:\Users\aa>python --version
Python 3.8.2

C:\Users\aa>pip --version
pip 20.0.2 from c:\users\aa\appdata\local\programs\python\python38\lib\site-packages\pip (python 3.8)

C:\Users\aa>java --version
java 14 2020-03-17
Java(TM) SE Runtime Environment (build 14+36-1461)
Java HotSpot(TM) 64-Bit Server VM (build 14+36-1461, mixed mode, sharing)

我该如何解决和解决该问题?目前我有spark-2.4.5-bin-hadoop2.7和python 3.8.2.预先感谢!

How can I fix and overcome the problem? Currently I have spark-2.4.5-bin-hadoop2.7 and python 3.8.2. Thanks in advance!

推荐答案

这是python3.8和spark版本的兼容性问题,您可以看到:

It is a python3.8 and spark version compatibility problem you can see : https://github.com/apache/spark/pull/26194.

要使其(在一定程度上)起作用,您需要:

To make it functional (to a certain extent) you need to :

  • Replace the cloudpickle.py file in your pyspark directory by its 1.1.1 version, find it in : https://github.com/cloudpipe/cloudpickle/blob/v1.1.1/cloudpickle/cloudpickle.py.
  • Edit the cloudpickle.py file to add :
def print_exec(stream):
    ei = sys.exc_info()
    traceback.print_exception(ei[0], ei[1], ei[2], None, stream)

然后您就可以导入pyspark.

you'll then be able to import pyspark.

这篇关于Spark安装问题-TypeError:必需为整数(获取类型字节)-spark-2.4.5-bin-hadoop2.7,hadoop 2.7.1,python 3.8.2的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆