火花中密集列和行数的差异 [英] Difference in dense rank and row number in spark

查看:86
本文介绍了火花中密集列和行数的差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图了解密集等级和行号之间的区别.每个新窗口分区都从1开始.行的等级不是总是从1开始吗?任何帮助将不胜感激

I tried to understand the difference between dense rank and row number.Each new window partition both is starting from 1. Does rank of a row is not always start from 1 ? Any help would be appreciated

推荐答案

不同之处在于订购列中有纽带".请查看以下示例:

The difference is when there are "ties" in the ordering column. Check the example below:

import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions._

val df = Seq(("a", 10), ("a", 10), ("a", 20)).toDF("col1", "col2")

val windowSpec = Window.partitionBy("col1").orderBy("col2")

df
  .withColumn("rank", rank().over(windowSpec))
  .withColumn("dense_rank", dense_rank().over(windowSpec))
  .withColumn("row_number", row_number().over(windowSpec)).show

+----+----+----+----------+----------+
|col1|col2|rank|dense_rank|row_number|
+----+----+----+----------+----------+
|   a|  10|   1|         1|         1|
|   a|  10|   1|         1|         2|
|   a|  20|   3|         2|         3|
+----+----+----+----------+----------+

请注意,值"10"在同一窗口(col1 = "a")中的col2中存在两次.那就是当您看到三个功能之间的差异时.

Note that the value "10" exists twice in col2 within the same window (col1 = "a"). That's when you see a difference between the three functions.

这篇关于火花中密集列和行数的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆