在不使用过滤器功能的情况下删除RDD中的第一个元素 [英] Remove first element in RDD without using filter function

查看：68 发布时间：2021/4/8 19:56:34 scala apache-spark rdd

本文介绍了在不使用过滤器功能的情况下删除RDD中的第一个元素的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经从文件中构建了一个RDD，其中RDD中的每个元素都是文件中用分隔符分隔的部分.

I have built an RDD from a file where each element in the RDD is section from the file separated by a delimiter.

val inputRDD1:RDD[(String,Long)] = myUtilities.paragraphFile(spark,path1)
                                              .coalesce(100*spark.defaultParallelism) 
                                              .zipWithIndex() //RDD[String, Long]
                                              .filter(f => f._2!=0)

我执行上述最后一项操作(过滤器)的原因是要删除第一个索引0.

The reason I do the last operation above (filter) is to remove the first index 0.

是否有更好的方法删除第一个元素，而不是像上面那样检查每个元素的索引值?

Is there a better way to remove the first element rather than to check each element for the index value as done above?

谢谢！

推荐答案

一种可能性是使用 RDD.mapPartitionsWithIndex 并从索引为0的迭代器中删除第一个元素:

One possibility is to use RDD.mapPartitionsWithIndex and to remove the first element from the iterator at index 0:

val inputRDD = myUtilities
                .paragraphFile(spark,path1)
                .coalesce(100*spark.defaultParallelism) 
                .mapPartitionsWithIndex(
                   (index, it) => if (index == 0) it.drop(1) else it,
                    preservesPartitioning = true
                 )

这样，您只能在第一个迭代器上前进一个项目，而其他所有项目都保持不变.这会更有效吗?大概.无论如何，我会测试两个版本，看看哪个版本的性能更好.

This way, you only ever advance a single item on the first iterator, where all others remain untouched. Is this be more efficient? Probably. Anyway, I'd test both versions to see which one performs better.

这篇关于在不使用过滤器功能的情况下删除RDD中的第一个元素的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在不使用过滤器功能的情况下删除RDD中的第一个元素 [英] Remove first element in RDD without using filter function

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在不使用过滤器功能的情况下删除RDD中的第一个元素 [英] Remove first element in RDD without using filter function

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭