建设星火JavaRDD列表从德罗presult对象 [英] Build Spark JavaRDD List from DropResult objects

查看：182 发布时间：2016/5/22 16:20:03 java apache-spark

本文介绍了建设星火JavaRDD列表从德罗presult对象的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

（什么是Scala中可能应该是可以在Java中，对吧？不过，我会采取斯卡拉的建议为好）

我不是来遍历一个RDD，而是我需要建立一个用从随机/模拟器类名为德罗presult类型的n个元素。 DRO presult不能转换成别的什么。

我以为星火找到PI的例子有我在正确的轨道，但没有运气上。下面是我想：

在一次性基础上德罗presult作出这样的：
从PLD进行单德罗presult（PipeLinkageData）

 德罗presult DRO presultSeed = pld.doDrop（）;

我想是这样的：

  JavaRDD＆LT;德罗presult＆GT; simCountRDD = spark.parallelize（makeRangeList（1，getSimCount（）））的foreach（pld.doDrop（））。

我只需要运行pld.doDrop（）在群集上大约10 ^ 6次，结果放火花RDD对于接下来的操作，也群集上。我想不出什么样的功能使用的并行，使这项工作。

makeRangeList：

 私人列表＆LT;整数GT; makeRangeList（INT下，诠释上部）{
    清单＆LT;整数GT;范围= IntStream.range（下限，上限）.boxed（）收集（Collectors.toList（））;
    返回范围;
}

（FWIW我试图用从 http://spark.apache.org/examples郫县为例.HTML 作为如何做一个for循环创建JavaRDD）模型

 诠释计数= spark.parallelize（makeRange（1，NUM_SAMPLES））过滤器（新功能＆LT;整型，布尔＆GT;（）{
  公共布尔调用（整数i）{
    双X =的Math.random（）;
    双Y =的Math.random（）;
    返回X * X + Y * Y＆LT; 1;
  }
}）。计数（）;
的System.out.println（皮大致是+ 4 *计数/ NUM_SAMPLES）;

解决方案

是啊，好像你应该能够很容易地做到这一点pretty。听起来像是你只需要并行的10 ^ 6个整数的RDD简单，使您可以创建10 ^ 6德罗presult对象到RDD。

如果是这样的话，我不认为你需要明确创建如上面的列表。好像你应该只能够使用makeRange（）的方式星火丕例子确实是这样的：

  JavaRDD＆LT;德罗presult＆GT; 。simCountRDD = spark.parallelize（makeRange（1，getSimCount（）））地图（新功能与LT;整数，德罗presult＆GT;（）
{
  公共德罗presult调用（整数i）{
     返回pld.doDrop（）;
  }
}）;

(What's possible in Scala should be possible in Java, right? But I would take Scala suggestions as well)

I am not trying to iterate over an RDD, instead I need to build one with n elements from a random/simulator class of a type called DropResult. DropResult can't be cast into anything else.

I thought the Spark "find PI" example had me on the right track but no luck. Here's what I am trying:

On a one-time basis a DropResult is made like this: make a single DropResult from pld (PipeLinkageData)

DropResult dropResultSeed = pld.doDrop();

I am trying something like this:

JavaRDD<DropResult> simCountRDD = spark.parallelize(makeRangeList(1, getSimCount())).foreach(pld.doDrop());

I just need to run pld.doDrop() about 10^6 times on the cluster and put the results in a Spark RDD for the next operation, also on the cluster. I can't figure out what kind of function to use on "parallelize" to make this work.

makeRangeList:

private List<Integer> makeRangeList(int lower, int upper) {
    List<Integer> range = IntStream.range(lower, upper).boxed().collect(Collectors.toList());
    return range;    
}

(FWIW I was trying to use the Pi example from http://spark.apache.org/examples.html as a model of how to do a for loop to create a JavaRDD)

int count = spark.parallelize(makeRange(1, NUM_SAMPLES)).filter(new Function<Integer, Boolean>() {
  public Boolean call(Integer i) {
    double x = Math.random();
    double y = Math.random();
    return x*x + y*y < 1;
  }
}).count();
System.out.println("Pi is roughly " + 4 * count / NUM_SAMPLES);

解决方案

Yea, seems like you should be able to do this pretty easily. Sounds like you just need to parallelize an RDD of 10^6 integers simply so that you can create 10^6 DropResult objects into an RDD.

If this is the case, I don't think you need to explicitly create a list as above. It seems like you should just be able to use makeRange() the way the Spark Pi example does like this :

JavaRDD<DropResult> simCountRDD = spark.parallelize(makeRange(1,getSimCount())).map(new Function<Integer, DropResult>() 
{
  public DropResult call(Integer i) { 
     return pld.doDrop(); 
  }
});

这篇关于建设星火JavaRDD列表从德罗presult对象的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

建设星火JavaRDD列表从德罗presult对象 [英] Build Spark JavaRDD List from DropResult objects

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

建设星火JavaRDD列表从德罗presult对象 [英] Build Spark JavaRDD List from DropResult objects

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭