Pyspark 得到 TypeError: can't pickle _abc_data objects [英] Pyspark got TypeError: can’t pickle _abc_data objects
问题描述
我正在尝试使用 pyspark 从腌制模型中生成预测,我使用以下命令获取模型
I’m trying to generate predictions from a pickled model with pyspark, I get the model with the following command
model = deserialize_python_object(filename)
使用 deserialize_python_object(filename)
定义为:
import pickle
def deserialize_python_object(filename):
try:
with open(filename, ‘rb’) as f:
obj = pickle.load(f)
except:
obj = None
return obj
错误日志如下所示:
File "/Users/gmg/anaconda3/envs/env/lib**strong text**/python3.7/site-packages/pyspark/sql/udf.py", line 189, in wrapper
return self(*args)
File "/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/sql/udf.py", line 167, in __call__
judf = self._judf
File "/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/sql/udf.py", line 151, in _judf
self._judf_placeholder = self._create_judf()
File "/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/sql/udf.py", line 160, in _create_judf
wrapped_func = _wrap_function(sc, self.func, self.returnType)
File "/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/sql/udf.py", line 35, in _wrap_function
pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
File "/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/rdd.py", line 2420, in _prepare_for_python_RDD
pickled_command = ser.dumps(command)
File "/Users/gmg/anaconda3/envs/env/lib/python3.7/site-packages/pyspark/serializers.py", line 600, in dumps
raise pickle.PicklingError(msg)
_pickle.PicklingError: Could not serialize object: TypeError: can’t pickle _abc_data objects
推荐答案
似乎您遇到了与本期相同的问题:https://github.com/cloudpipe/cloudpickle/issues/180
Seems that you are having the same problem like in this issue: https://github.com/cloudpipe/cloudpickle/issues/180
发生的事情是 pyspark 的 cloudpickle 库对于 python 3.7 已经过时了,你现在应该用这个精心制作的补丁修复这个问题 直到 pyspark 更新该模块.
What is happening is that pyspark's cloudpickle library is outdated for python 3.7, you should fix the problem with this crafted patch by now until pyspark gets that module updated.
尝试使用此解决方法:
安装 cloudpickle
pip install cloudpickle
将此添加到您的代码中:
Add this to your code:
import cloudpickle
import pyspark.serializers
pyspark.serializers.cloudpickle = cloudpickle
monkeypatch 信用 https://github.com/cloudpipe/cloudpickle/issues/305一个>
monkeypatch credit https://github.com/cloudpipe/cloudpickle/issues/305
这篇关于Pyspark 得到 TypeError: can't pickle _abc_data objects的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!