建设星火JavaRDD列表从德罗presult对象 [英] Build Spark JavaRDD List from DropResult objects

查看:182
本文介绍了建设星火JavaRDD列表从德罗presult对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

(什么是Scala中可能应该是可以在Java中,对吧?不过,我会采取斯卡拉的建议为好)

我不是来遍历一个RDD,而是我需要建立一个用从随机/模拟器类名为德罗presult类型的n个元素。 DRO presult不能转换成别的什么。

我以为星火找到PI的例子有我在正确的轨道,但没有运气上。下面是我想:

在一次性基础上德罗presult作出这样的:
从PLD进行单德罗presult(PipeLinkageData)

 德罗presult DRO presultSeed = pld.doDrop();

我想是这样的:

  JavaRDD<德罗presult> simCountRDD = spark.parallelize(makeRangeList(1,getSimCount()))的foreach(pld.doDrop())。

我只需要运行pld.​​doDrop()在群集上大约10 ^ 6次,结果放火花RDD对于接下来的操作,也群集上。我想不出什么样的功能使用的并行,使这项工作。

makeRangeList:

 私人列表<整数GT; makeRangeList(INT下,诠释上部){
    清单<整数GT;范围= IntStream.range(下限,上限).boxed()收集(Collectors.toList());
    返回范围;
}

(FWIW我试图用从 http://spark.apache.org/examples郫县为例.HTML 作为如何做一个for循环创建JavaRDD)模型

 诠释计数= spark.parallelize(makeRange(1,NUM_SAMPLES))过滤器(新功能<整型,布尔>(){
  公共布尔调用(整数i){
    双X =的Math.random();
    双Y =的Math.random();
    返回X * X + Y * Y< 1;
  }
})。计数();
的System.out.println(皮大致是+ 4 *计数/ NUM_SAMPLES);


解决方案

是啊,好像你应该​​能够很容易地做到这一点pretty。听起来像是你只需要并行的10 ^ 6个整数的RDD简单,使您可以创建10 ^ 6德罗presult对象到RDD。

如果是这样的话,我不认为你需要明确创建如上面的列表。好像你应该​​只能够使用makeRange()的方式星火丕例子确实是这样的:

  JavaRDD<德罗presult> 。simCountRDD = spark.parallelize(makeRange(1,getSimCount()))地图(新功能与LT;整数,德罗presult>()
{
  公共德罗presult调用(整数i){
     返回pld.doDrop();
  }
});

(What's possible in Scala should be possible in Java, right? But I would take Scala suggestions as well)

I am not trying to iterate over an RDD, instead I need to build one with n elements from a random/simulator class of a type called DropResult. DropResult can't be cast into anything else.

I thought the Spark "find PI" example had me on the right track but no luck. Here's what I am trying:

On a one-time basis a DropResult is made like this: make a single DropResult from pld (PipeLinkageData)

DropResult dropResultSeed = pld.doDrop();

I am trying something like this:

JavaRDD<DropResult> simCountRDD = spark.parallelize(makeRangeList(1, getSimCount())).foreach(pld.doDrop());

I just need to run pld.doDrop() about 10^6 times on the cluster and put the results in a Spark RDD for the next operation, also on the cluster. I can't figure out what kind of function to use on "parallelize" to make this work.

makeRangeList:

private List<Integer> makeRangeList(int lower, int upper) {
    List<Integer> range = IntStream.range(lower, upper).boxed().collect(Collectors.toList());
    return range;    
}

(FWIW I was trying to use the Pi example from http://spark.apache.org/examples.html as a model of how to do a for loop to create a JavaRDD)

int count = spark.parallelize(makeRange(1, NUM_SAMPLES)).filter(new Function<Integer, Boolean>() {
  public Boolean call(Integer i) {
    double x = Math.random();
    double y = Math.random();
    return x*x + y*y < 1;
  }
}).count();
System.out.println("Pi is roughly " + 4 * count / NUM_SAMPLES);

解决方案

Yea, seems like you should be able to do this pretty easily. Sounds like you just need to parallelize an RDD of 10^6 integers simply so that you can create 10^6 DropResult objects into an RDD.

If this is the case, I don't think you need to explicitly create a list as above. It seems like you should just be able to use makeRange() the way the Spark Pi example does like this :

JavaRDD<DropResult> simCountRDD = spark.parallelize(makeRange(1,getSimCount())).map(new Function<Integer, DropResult>() 
{
  public DropResult call(Integer i) { 
     return pld.doDrop(); 
  }
});

这篇关于建设星火JavaRDD列表从德罗presult对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆