Pyspark 得到 TypeError: can't pickle _abc_data objects [英] Pyspark got TypeError: can’t pickle _abc_data objects

查看:108
本文介绍了Pyspark 得到 TypeError: can't pickle _abc_data objects的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 pyspark 从腌制模型中生成预测,我使用以下命令获取模型

I’m trying to generate predictions from a pickled model with pyspark, I get the model with the following command

model = deserialize_python_object(filename)

使用 deserialize_python_object(filename) 定义为:

import pickle
def deserialize_python_object(filename):
try:
    with open(filename, ‘rb’) as f:
        obj = pickle.load(f)
except:
    obj = None
return obj

错误日志如下所示:

File "/Users/gmg/anaconda3/envs/env/lib**strong text**/python3.7/site-packages/pyspark/sql/udf.py", line 189, in wrapper
    return self(*args)
  File "/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/sql/udf.py", line 167, in __call__
    judf = self._judf
  File "/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/sql/udf.py", line 151, in _judf
    self._judf_placeholder = self._create_judf()
  File "/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/sql/udf.py", line 160, in _create_judf
    wrapped_func = _wrap_function(sc, self.func, self.returnType)
  File "/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/sql/udf.py", line 35, in _wrap_function
    pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
  File "/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/rdd.py", line 2420, in _prepare_for_python_RDD
    pickled_command = ser.dumps(command)
  File "/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/serializers.py", line 600, in dumps
    raise pickle.PicklingError(msg)
_pickle.PicklingError: Could not serialize object: TypeError: can’t pickle _abc_data objects

推荐答案

似乎您遇到了与本期相同的问题:https://github.com/cloudpipe/cloudpickle/issues/180

Seems that you are having the same problem like in this issue: https://github.com/cloudpipe/cloudpickle/issues/180

发生的事情是 pyspark 的 cloudpickle 库对于 python 3.7 已经过时了,你现在应该用这个精心制作的补丁修复这个问题 直到 pyspark 更新该模块.

What is happening is that pyspark's cloudpickle library is outdated for python 3.7, you should fix the problem with this crafted patch by now until pyspark gets that module updated.

尝试使用此解决方法:

  1. 安装 cloudpickle pip install cloudpickle

将此添加到您的代码中:

Add this to your code:

import cloudpickle
import pyspark.serializers
pyspark.serializers.cloudpickle = cloudpickle

monkeypatch 信用 https://github.com/cloudpipe/cloudpickle/issues/305

monkeypatch credit https://github.com/cloudpipe/cloudpickle/issues/305

这篇关于Pyspark 得到 TypeError: can't pickle _abc_data objects的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆