如何删除数据帧Scala/sSark中的前几行? [英] How to delete the first few rows in dataframe Scala/sSark?
本文介绍了如何删除数据帧Scala/sSark中的前几行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个DataFrame,我想删除第一行和第二行.我该怎么办?
I hava a DataFrame and I want to delete first and the second row. What should I do?
这是我的输入内容
+-----+
|value|
+-----+
| 1|
| 4|
| 3|
| 5|
| 4|
| 18|
-------
这是例外结果:
+-----+
|value|
+-----+
| 3|
| 5|
| 4|
| 18|
-------
推荐答案
在我看来,如果无法定义数据框的顺序,那么谈论第一条记录或第二条记录就没有意义. show
语句导致的记录顺序是任意的",并取决于数据的分区.
In my opinion it does not make sense to speak about a first or second record if you cannot define an ordering of your dataframe. The ordering of the records as a result of the show
statement is "arbitrary" and depends on partitioning of your data.
假设您有一列可以订购记录的列,则可以使用窗口功能.从此数据帧开始:
Suppose you have a column over which you can order your records, you can use Window-functions. Starting with this dataframe:
+----+-----+
|year|value|
+----+-----+
|2007| 1|
|2008| 4|
|2009| 3|
|2010| 5|
|2011| 4|
|2012| 18|
+----+-----+
你可以做
import org.apache.spark.sql.expressions.Window
df
.withColumn("rn",row_number().over(Window.orderBy($"year")))
.where($"rn">2).drop($"rn")
.show
这篇关于如何删除数据帧Scala/sSark中的前几行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文