Apache Spark mapPartitionsWithIndex [英] Apache Spark mapPartitionsWithIndex

查看:179
本文介绍了Apache Spark mapPartitionsWithIndex的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人能举例说明在Java中正确使用mapPartitionsWithIndex吗?我发现了许多Scala示例,但缺少Java。
我的理解是正确的,当使用此函数时,单独的节点将处理单独的分区。

Can someone give example of correct usage of mapPartitionsWithIndex in Java? I've found a lot of Scala examples, but there is lack of Java ones. Is my understanding correct that separate partitions will be handled by separate nodes when using this function.

我收到以下错误

method mapPartitionsWithIndex in class JavaRDD<T> cannot be applied to given types;
    JavaRDD<String> rdd = sc.textFile(filename).mapPartitionsWithIndex
    required: Function2<Integer,Iterator<String>,Iterator<R>>,boolean
    found: <anonymous Function2<Integer,Iterator<String>,Iterator<JavaRDD<String>>>>

执行时

JavaRDD<String> rdd = sc.textFile(filename).mapPartitionsWithIndex(
    new Function2<Integer, Iterator<String>, Iterator<JavaRDD<String>> >() {

    @Override
    public Iterator<JavaRDD<String>> call(Integer ind, String s) { 


推荐答案

以下是我用来删除csv文件第一行的代码:

Here is the code I use to remove the first line of a csv file:

JavaRDD<String> rawInputRdd = sparkContext.textFile(dataFile);

Function2 removeHeader= new Function2<Integer, Iterator<String>, Iterator<String>>(){
    @Override
    public Iterator<String> call(Integer ind, Iterator<String> iterator) throws Exception {
        if(ind==0 && iterator.hasNext()){
            iterator.next();
            return iterator;
        }else
            return iterator;
    }
};
JavaRDD<String> inputRdd = rawInputRdd.mapPartitionsWithIndex(removeHeader, false);

这篇关于Apache Spark mapPartitionsWithIndex的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆