在 Spark Dataframe 中的窗口上创建组 ID [英] Create a group id over a window in Spark Dataframe

查看：40 发布时间：2021/11/14 21:55:39 apache-spark pyspark apache-spark-sql window-functions

本文介绍了在 Spark Dataframe 中的窗口上创建组 ID的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据框，我想在其中提供每个 Window 分区中的 id.例如我有

I have a dataframe where I want to give id's in each Window partition. For example I have

id | col |
1  |  a  |
2  |  a  |
3  |  b  |
4  |  c  |
5  |  c  |

所以我想要(基于列 col 分组)

So I want (based on grouping with column col)

id | group |
1  |  1    |
2  |  1    |
3  |  2    |
4  |  3    |
5  |  3    |

我想使用窗口函数，但无论如何我找不到为每个窗口分配一个 Id.我需要类似的东西:

I want to use a window function but I cannot find anyway to assign an Id to each window. I need something like:

w = Window().partitionBy('col')
df = df.withColumn("group", id().over(w))

有什么办法可以实现这样的目标.(我不能简单地使用 col 作为组 ID，因为我有兴趣在多列上创建一个窗口)

Is there any way to achive somethong like that. (I cannot simply use col as a group id because I am interested in creating a window over multiple columns)

推荐答案

简单地使用 dense_rank inbuilt function over Window function 应该给你你想要的结果

Simply using a dense_rank inbuilt function over Window function should give you your desired result as

from pyspark.sql import window as W
import pyspark.sql.functions as f
df.select('id', f.dense_rank().over(W.Window.orderBy('col')).alias('group')).show(truncate=False)

应该给你

+---+-----+
|id |group|
+---+-----+
|1  |1    |
|2  |1    |
|3  |2    |
|4  |3    |
|5  |3    |
+---+-----+

这篇关于在 Spark Dataframe 中的窗口上创建组 ID的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在 Spark Dataframe 中的窗口上创建组 ID [英] Create a group id over a window in Spark Dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在 Spark Dataframe 中的窗口上创建组 ID [英] Create a group id over a window in Spark Dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭