如何对每个列值使用不同的窗口规范? [英] How to use different window specification per column values?

查看:63
本文介绍了如何对每个列值使用不同的窗口规范?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的partitionBy条件,我需要根据数据框中的列值进行更改.

This is my partitionBy condition which i need to change based on the column value from the data frame .

val windowSpec = Window.partitionBy("col1", "clo2","clo3").orderBy($"Col5".desc) 

现在,如果数据帧中列(col6)之一的值是I,则满足上述条件.

Now if the value of the one of the column (col6) in data frame is I then above condition .

但是当column(col6)的值更改为O时,则处于以下条件

But when the value of the column(col6) changes O then below condition

val windowSpec = Window.partitionBy("col1","clo3").orderBy($"Col5".desc)

如何在spark数据框中实现它.

How can i implement it in the spark data frame .

因此,就像每条记录一样,它将根据该partitionBy条件检查col6是I还是O

So it is like for each record it will check whether col6 is I or O based on that partitionBy condition will be applied

推荐答案

鉴于需要根据col6列的值选择最终的窗口规范,我首先要做filter,然后是最终的窗口聚合

Given the requirement to select the final window specification based on the values of col6 column, I'd do filter first followed by the final window aggregation.

scala> dataset.show
+----+----+----+----+----+
|col1|col2|col3|col5|col6|
+----+----+----+----+----+
|   0|   0|   0|   0|   I| // <-- triggers 3 columns to use
|   0|   0|   0|   0|   O| // <-- the aggregation should use just 2 columns
+----+----+----+----+----+

使用上面的数据集,我想filter看看col6中是否至少有一个I并应用窗口规范.

With the above dataset, I'd filter out to see if there's at least one I in col6 and apply the window specification.

val windowSpecForIs = Window.partitionBy("col1", "clo2","clo3").orderBy($"Col5".desc)
val windowSpecForOs = Window.partitionBy("col1","clo3").orderBy($"Col5".desc)

val noIs = dataset.filter($"col6" === "I").take(1).isEmpty
val windowSpec = if (noIs) windowSpecForOs else windowSpecForIs

这篇关于如何对每个列值使用不同的窗口规范?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆