Spark:删除RDD [String]的最后一个元素的正确方法 [英] Spark: Proper way to remove the last element of an RDD[String]

查看：109 发布时间：2021/4/8 19:45:08 scala performance apache-spark

本文介绍了Spark:删除RDD [String]的最后一个元素的正确方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试删除RDD [String]的最后一个元素.

I am trying to remove the last element of an RDD[String].

到目前为止，我正在这样做:

So far I'm doing this:

val n: Long = rdd.count()
val startIndex: Long = n - 1

val lastElem = rdd.zipWithIndex()
  .filter{ case (_, index) => index >= startIndex }
  .keys
  .collect()

val newRdd = rdd.filter(x => !x.equalsIgnoreCase(lastElem(0))).cache()

也就是，获取rdd的最后一个元素，并对其进行过滤，以获取除最后一个元素之外的所有元素.

That is, taking the last element of the rdd, and filter it to get all the elements less the last one.

这很好，但是有更好的方法吗?

This is working well, but is there a better way to do it?

推荐答案

scala 中有 init 函数，该函数为您提供除集合中最后一个元素以外的所有元素.您可以利用那个

There is init function in scala which gives you all the elements except the last one in a collection. You can utilize that one

val newRdd = sc.parallelize(rdd.collect().toList.init)

这应该通过删除最后一个元素为您提供新rdd ，并且比您的方法要好得多，因为collect仅使用一次.

this should give you new rdd by removing the last element and is better than your approach as collect is used only once.

并且rdd是分布式的，如果不将其收集到一个节点就无法分辨出最后一个字符串.

我已将其收集到驱动程序节点.您可以使用另一种技术来收集执行者，并使用 init 函数

这篇关于Spark:删除RDD [String]的最后一个元素的正确方法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark:删除RDD [String]的最后一个元素的正确方法 [英] Spark: Proper way to remove the last element of an RDD[String]

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark:删除RDD [String]的最后一个元素的正确方法 [英] Spark: Proper way to remove the last element of an RDD[String]

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭