如何对每个列值使用不同的窗口规范? [英] How to use different window specification per column values?
问题描述
这是我的partitionBy条件,我需要根据数据框中的列值进行更改.
This is my partitionBy condition which i need to change based on the column value from the data frame .
val windowSpec = Window.partitionBy("col1", "clo2","clo3").orderBy($"Col5".desc)
现在,如果数据帧中列(col6)之一的值是I,则满足上述条件.
Now if the value of the one of the column (col6) in data frame is I then above condition .
但是当column(col6)的值更改为O时,则处于以下条件
But when the value of the column(col6) changes O then below condition
val windowSpec = Window.partitionBy("col1","clo3").orderBy($"Col5".desc)
如何在spark数据框中实现它.
How can i implement it in the spark data frame .
因此,就像每条记录一样,它将根据该partitionBy条件检查col6是I还是O
So it is like for each record it will check whether col6 is I or O based on that partitionBy condition will be applied
推荐答案
鉴于需要根据col6
列的值选择最终的窗口规范,我首先要做filter
,然后是最终的窗口聚合
Given the requirement to select the final window specification based on the values of col6
column, I'd do filter
first followed by the final window aggregation.
scala> dataset.show
+----+----+----+----+----+
|col1|col2|col3|col5|col6|
+----+----+----+----+----+
| 0| 0| 0| 0| I| // <-- triggers 3 columns to use
| 0| 0| 0| 0| O| // <-- the aggregation should use just 2 columns
+----+----+----+----+----+
使用上面的数据集,我想filter
看看col6
中是否至少有一个I
并应用窗口规范.
With the above dataset, I'd filter
out to see if there's at least one I
in col6
and apply the window specification.
val windowSpecForIs = Window.partitionBy("col1", "clo2","clo3").orderBy($"Col5".desc)
val windowSpecForOs = Window.partitionBy("col1","clo3").orderBy($"Col5".desc)
val noIs = dataset.filter($"col6" === "I").take(1).isEmpty
val windowSpec = if (noIs) windowSpecForOs else windowSpecForIs
这篇关于如何对每个列值使用不同的窗口规范?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!