建设星火JavaRDD列表从德罗presult对象 [英] Build Spark JavaRDD List from DropResult objects
问题描述
(什么是Scala中可能应该是可以在Java中,对吧?不过,我会采取斯卡拉的建议为好)
我不是来遍历一个RDD,而是我需要建立一个用从随机/模拟器类名为德罗presult类型的n个元素。 DRO presult不能转换成别的什么。
我以为星火找到PI的例子有我在正确的轨道,但没有运气上。下面是我想:
在一次性基础上德罗presult作出这样的:
从PLD进行单德罗presult(PipeLinkageData)
德罗presult DRO presultSeed = pld.doDrop();
我想是这样的:
JavaRDD<德罗presult> simCountRDD = spark.parallelize(makeRangeList(1,getSimCount()))的foreach(pld.doDrop())。
我只需要运行pld.doDrop()在群集上大约10 ^ 6次,结果放火花RDD对于接下来的操作,也群集上。我想不出什么样的功能使用的并行,使这项工作。
makeRangeList:
私人列表<整数GT; makeRangeList(INT下,诠释上部){
清单<整数GT;范围= IntStream.range(下限,上限).boxed()收集(Collectors.toList());
返回范围;
}
(FWIW我试图用从 http://spark.apache.org/examples郫县为例.HTML 作为如何做一个for循环创建JavaRDD)模型
诠释计数= spark.parallelize(makeRange(1,NUM_SAMPLES))过滤器(新功能<整型,布尔>(){
公共布尔调用(整数i){
双X =的Math.random();
双Y =的Math.random();
返回X * X + Y * Y< 1;
}
})。计数();
的System.out.println(皮大致是+ 4 *计数/ NUM_SAMPLES);
是啊,好像你应该能够很容易地做到这一点pretty。听起来像是你只需要并行的10 ^ 6个整数的RDD简单,使您可以创建10 ^ 6德罗presult对象到RDD。
如果是这样的话,我不认为你需要明确创建如上面的列表。好像你应该只能够使用makeRange()的方式星火丕例子确实是这样的:
JavaRDD<德罗presult> 。simCountRDD = spark.parallelize(makeRange(1,getSimCount()))地图(新功能与LT;整数,德罗presult>()
{
公共德罗presult调用(整数i){
返回pld.doDrop();
}
});
(What's possible in Scala should be possible in Java, right? But I would take Scala suggestions as well)
I am not trying to iterate over an RDD, instead I need to build one with n elements from a random/simulator class of a type called DropResult. DropResult can't be cast into anything else.
I thought the Spark "find PI" example had me on the right track but no luck. Here's what I am trying:
On a one-time basis a DropResult is made like this: make a single DropResult from pld (PipeLinkageData)
DropResult dropResultSeed = pld.doDrop();
I am trying something like this:
JavaRDD<DropResult> simCountRDD = spark.parallelize(makeRangeList(1, getSimCount())).foreach(pld.doDrop());
I just need to run pld.doDrop() about 10^6 times on the cluster and put the results in a Spark RDD for the next operation, also on the cluster. I can't figure out what kind of function to use on "parallelize" to make this work.
makeRangeList:
private List<Integer> makeRangeList(int lower, int upper) {
List<Integer> range = IntStream.range(lower, upper).boxed().collect(Collectors.toList());
return range;
}
(FWIW I was trying to use the Pi example from http://spark.apache.org/examples.html as a model of how to do a for loop to create a JavaRDD)
int count = spark.parallelize(makeRange(1, NUM_SAMPLES)).filter(new Function<Integer, Boolean>() {
public Boolean call(Integer i) {
double x = Math.random();
double y = Math.random();
return x*x + y*y < 1;
}
}).count();
System.out.println("Pi is roughly " + 4 * count / NUM_SAMPLES);
Yea, seems like you should be able to do this pretty easily. Sounds like you just need to parallelize an RDD of 10^6 integers simply so that you can create 10^6 DropResult objects into an RDD.
If this is the case, I don't think you need to explicitly create a list as above. It seems like you should just be able to use makeRange() the way the Spark Pi example does like this :
JavaRDD<DropResult> simCountRDD = spark.parallelize(makeRange(1,getSimCount())).map(new Function<Integer, DropResult>()
{
public DropResult call(Integer i) {
return pld.doDrop();
}
});
这篇关于建设星火JavaRDD列表从德罗presult对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!