pickle.PicklingError:无法腌制未打开以供读取的文件 [英] pickle.PicklingError: Cannot pickle files that are not opened for reading

查看:421
本文介绍了pickle.PicklingError:无法腌制未打开以供读取的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在dataproc上运行pyspark作业时出现此错误.可能是什么原因?

i'm getting this error while running pyspark job on dataproc. What could be the reason ?

这是错误的堆栈跟踪.

  File "/usr/lib/python2.7/pickle.py", line 331, in save
  self.save_reduce(obj=obj, *rv)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", 
  line 553, in save_reduce
  File "/usr/lib/python2.7/pickle.py", line 286, in save
  f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.7/pickle.py", line 649, in save_dict
  self._batch_setitems(obj.iteritems())
  File "/usr/lib/python2.7/pickle.py", line 681, in _batch_setitems
  save(v)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
  f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", 
  line 582, in save_file
  pickle.PicklingError: Cannot pickle files that are not opened for reading

推荐答案

我发现了问题.我在Map函数中使用了字典. 失败的原因:工作节点无法访问我在map函数中传递的字典.

I found out the issue.I was using a dictionary in the Map function. The reason it was failing: worker nodes couldn't access the dictionary which I was passing in map function.

解决方案:

I broadcasted the dictionary and then used it in function (Map)
sc =  SparkContext()
lookup_bc = sc.broadcast(lookup_dict)

然后在函数中,我通过使用此方法获得了价值:

Then in function, I took value by using this:

data = lookup_bc.value.get(key)

希望有帮助!

这篇关于pickle.PicklingError:无法腌制未打开以供读取的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆