spark javardd方法collect()&之间有什么区别? collectAsync()? [英] what is the difference between spark javardd methods collect() & collectAsync()?

查看:317
本文介绍了spark javardd方法collect()&之间有什么区别? collectAsync()?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在探索Spark 2.0 Java API,并对collect()& collectAsync()可用于javardd.

I am exploring the spark 2.0 java api and have a doubt regarding collect() & collectAsync() available for javardd.

推荐答案

collect():

它返回一个包含此RDD中所有元素的数组.

collect():

It returns an array that contains all of the elements in this RDD.

List<Integer> data = Arrays.asList(1, 2, 3, 4, 5);
JavaRDD<Integer> rdd = sc.parallelize(data, 1);
List<Integer> result = rdd.collect(); 
//elements in will be copied to driver in above step and control will 
//wait till the action completes


collectAsync():

collect异步版本,该版本返回 Future (java.util.concurrent.Future),用于检索包含此RDD中所有元素的数组.


collectAsync():

The asynchronous version of collect, which returns a Future(java.util.concurrent.Future) for retrieving an array containing all of the elements in this RDD.

List<Integer> data = Arrays.asList(1, 2, 3, 4, 5);
JavaRDD<Integer> rdd = sc.parallelize(data, 1);
JavaFutureAction<List<Integer>> future = rdd.collectAsync(); 
// retuns only future object but not data (no latency here)

List<Integer> result = future.get(); 
//Now elements in will be copied to driver

我们仅在同步(线程将等待直到操作在collect() 中完成)还是异步(线程将获得Future对象并传递到下一个对象)上看到数据接收方式的差异说明)

这篇关于spark javardd方法collect()&amp;之间有什么区别? collectAsync()?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆