火花中密集列和行数的差异 [英] Difference in dense rank and row number in spark
本文介绍了火花中密集列和行数的差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我试图了解密集等级和行号之间的区别.每个新窗口分区都从1开始.行的等级不是总是从1开始吗?任何帮助将不胜感激
I tried to understand the difference between dense rank and row number.Each new window partition both is starting from 1. Does rank of a row is not always start from 1 ? Any help would be appreciated
推荐答案
不同之处在于订购列中有纽带".请查看以下示例:
The difference is when there are "ties" in the ordering column. Check the example below:
import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions._
val df = Seq(("a", 10), ("a", 10), ("a", 20)).toDF("col1", "col2")
val windowSpec = Window.partitionBy("col1").orderBy("col2")
df
.withColumn("rank", rank().over(windowSpec))
.withColumn("dense_rank", dense_rank().over(windowSpec))
.withColumn("row_number", row_number().over(windowSpec)).show
+----+----+----+----------+----------+
|col1|col2|rank|dense_rank|row_number|
+----+----+----+----------+----------+
| a| 10| 1| 1| 1|
| a| 10| 1| 1| 2|
| a| 20| 3| 2| 3|
+----+----+----+----------+----------+
请注意,值"10"在同一窗口(col1 = "a"
)中的col2
中存在两次.那就是当您看到三个功能之间的差异时.
Note that the value "10" exists twice in col2
within the same window (col1 = "a"
). That's when you see a difference between the three functions.
这篇关于火花中密集列和行数的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文