火花如何删除CSV文件中的最后一行 [英] spark how to remove last line in a csv file

查看：90 发布时间：2021/4/8 20:19:01 apache-spark spark-dataframe rdd

本文介绍了火花如何删除CSV文件中的最后一行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是火花新手，我想从csv文件中删除标题和最后一行

I am new to spark I want to remove header and last line from a csv file

      Notes  xyz
     "id","member_id"
     "60045257","63989975",
     "60981766","65023535",

     Total amount:4444228900
     Total amount: 133826689

我要从文件中删除行 Notes xyz ，总额:4444228900 和总额:133826689 .我已删除了第一行文件中的一行

I want to remove line Notes xyz ,Total amount:4444228900 and Total amount: 133826689 from the file .I have removed the first line from the file

val dfRetail = sc.textFile("file:////home/cloudera/Projects/Project3/test/test_3.csv");
var header=dfRetail.first();
var final_data=dfRetail.filter(row => row!=header);

如何删除最后几行?

推荐答案

使用zipWithIndex，然后按索引过滤:

Use zipWithIndex and then filter by index:

val total = dfRetail.count();
val withoutFooter = dfRetail.zipWithIndex()
                            .filter(x => x._2 < total - 3)
                            .map (x => x._1)

它将每条线映射到一对(线，索引).然后，您可以过滤此RDD，仅选择索引比对象总数低的对象-3-因此没有页脚.当您将其仅映射到元组的第一个元素时，对于文档行，则如此

It will map each line to pair (line, index). Then you filter this RDD, selecting only those with index lower than total number of objects - 3 - so without footer. When you map it to only first element of tuple, so for line of document

您还可以使用mapPartitionsWithIndex:

You can also use mapPartitionsWithIndex:

val withoutFooter = dfRetail.mapPartitionsWithIndex { (idx, iter) => 
     val size = iter.size();
     if (idx == noOfTotalPartitions) {
         iter.take(size - 3)
     }
     else iter 
});

它以相同的方式工作，但可能会更快

It's working in the same way, but may be faster

这篇关于火花如何删除CSV文件中的最后一行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

火花如何删除CSV文件中的最后一行 [英] spark how to remove last line in a csv file

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

火花如何删除CSV文件中的最后一行 [英] spark how to remove last line in a csv file

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭