如何在不使用收集功能的情况下有效地将rdd转换为列表 [英] how to convert rdd to list effectively without using collect function

查看:149
本文介绍了如何在不使用收集功能的情况下有效地将rdd转换为列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们知道,如果需要将RDD转换为列表,则应使用collect().但是此功能给驱动程序带来了很大的压力(因为它将所有数据从不同的执行者带到驱动程序),这会导致性能下降或恶化(整个应用程序可能会失败).

We know that if we need to convert RDD to a list, then we should use collect(). but this function puts a lot of stress on the driver (as it brings all the data from different executors to the driver) which causes performance degradation or worse (whole application may fail).

是否有其他方法可以将RDD转换为任何Java util集合,而无需使用collect()或collectAsMap()等,而不会导致性能下降?

Is there any other way to convert RDD into any of the java util collection without using collect() or collectAsMap() etc which does not cause performance degrade?

基本上,在当前以批处理或流数据处理方式处理大量数据的情况下,诸如collect()和collectAsMap()之类的API在具有大量数据的真实项目中已变得完全无用.我们可以在演示代码中使用它,但是所有这些都可以用于这些API.那么为什么要拥有一个我们甚至无法使用的API(或者我错过了什么).

Basically in current scenario where we deal with huge amount of data in batch or stream data processing, APIs like collect() and collectAsMap() has become completely useless in a real project with real amount of data. We can use it in demo code, but that's all there to use for these APIs. So why to have an API which we can not even use (Or am I missing something).

是否有更好的方法可以通过其他方法来达到相同的结果,或者我们可以以更有效的方式来实现collect()和collectAsMap(),而不仅仅是调用

Can there be a better way to achieve the same result through some other method or can we implement collect() and collectAsMap() in a more effective way other that just calling

List<String> myList= RDD.collect.toList(影响性能)

我仰望Google,但找不到任何有效的方法.如果有人有更好的方法,请提供帮助.

I looked up to google but could not find anything which can be effective. Please help if someone has got a better approach.

推荐答案

是否有其他方法可以将RDD转换为任何Java util集合,而无需使用collect()或collectAsMap()等,而不会导致性能下降?

Is there any other way to convert RDD into any of the java util collection without using collect() or collectAsMap() etc which does not cause performance degrade?

不,不可能.如果有这种方法,collect将首先使用它来实现.

No, and there can't be. And if there were such a way, collect would be implemented using it in the first place.

好吧,从技术上讲,您可以在RDD(或其中的大多数?)的顶部实现List接口,但这将是一个糟糕的主意,而且毫无意义.

Well, technically you could implement List interface on top of RDD (or most of it?), but that would be a bad idea and quite pointless.

那为什么要拥有一个我们甚至无法使用的API(或者我错过了什么).

So why to have an API which we can not even use (Or am I missing something).

collect旨在用于仅大RDD作为输入或中间结果而输出足够小的情况.如果不是您这种情况,请改用foreach或其他操作.

collect is intended to be used for cases where only large RDDs are inputs or intermediate results, and the output is small enough. If that's not your case, use foreach or other actions instead.

这篇关于如何在不使用收集功能的情况下有效地将rdd转换为列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆