腌制Spark RDD并将其读入Python [英] Pickling a Spark RDD and reading it into Python

查看：101 发布时间：2020/5/27 20:23:51 python apache-spark pickle pyspark

本文介绍了腌制Spark RDD并将其读入Python的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图通过腌制来序列化Spark RDD，然后将腌制的文件直接读取到Python中.

I am trying to serialize a Spark RDD by pickling it, and read the pickled file directly into Python.

a = sc.parallelize(['1','2','3','4','5'])
a.saveAsPickleFile('test_pkl')

然后我将test_pkl文件复制到本地.如何将它们直接读入Python?当我尝试普通的泡菜包装时，当我尝试读取"test_pkl"的第一个泡菜部分时，它会失败:

I then copy the test_pkl files to my local. How can I read them directly into Python? When I try the normal pickle package, it fails when I attempt to read the first pickle part of 'test_pkl':

pickle.load(open('part-00000','rb'))

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.6/pickle.py", line 1370, in load
    return Unpickler(file).load()
  File "/usr/lib64/python2.6/pickle.py", line 858, in load
    dispatch[key](self)
  File "/usr/lib64/python2.6/pickle.py", line 970, in load_string
    raise ValueError, "insecure string pickle"
ValueError: insecure string pickle

我假设spark正在使用的腌制方法与python pickle方法不同(如果我错了，请纠正我).我有什么办法可以从Spark腌制数据并将此腌制对象直接从文件中读取到python中?

I assume that the pickling method that spark is using is different than the python pickle method (correct me if I am wrong). Is there any way for me to pickle data from Spark and read this pickled object directly into python from the file?

腌制Spark RDD并将其读入Python [英] Pickling a Spark RDD and reading it into Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

腌制Spark RDD并将其读入Python [英] Pickling a Spark RDD and reading it into Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭