有没有比收集更好的方法来读取Spark中的RDD了? [英] Is there any better method than collect to read an RDD in spark?

查看:77
本文介绍了有没有比收集更好的方法来读取Spark中的RDD了?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我想读取RDD并将其放入一个数组中.为此,我可以使用 collect 方法.但是该方法确实很烦人,因为在我的情况下,它一直在给出kyro缓冲区溢出错误.如果我过多地设置了kyro缓冲区大小,它将开始出现自己的问题.另一方面,我注意到,如果仅使用 saveAsTextFile 方法将RDD保存到文件中,则不会出错.因此,我在想,必须有一些更好的方法来将RDD读入数组,这没有 collect 方法那样麻烦.

So, I want to read and RDD into an array. For that purpose, I could use the collect method. But that method is really annoying as in my case it keeps on giving kyro buffer overflow errors. If I set the kyro buffer size too much, it starts to have its own problems. On the other hand, I have noticed that if I just save the RDD into a file using the saveAsTextFile method, I get no errors. So, I was thinking, there must be some better method of reading an RDD into an array which isn't as problematic as the collect method.

推荐答案

否. collect是将RDD读入数组的唯一方法.

No. collect is the only method for reading an RDD into an array.

saveAsTextFile永远不必将所有数据收集到一台计算机上,因此它不受与collect相同的方式受一台计算机上可用内存的限制.

saveAsTextFile never has to collect all the data to one machine, so it is not limited by the available memory on a single machine in the same way that collect is.

这篇关于有没有比收集更好的方法来读取Spark中的RDD了?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆