窗口函数中的最大值 [英] max in window functions
本文介绍了窗口函数中的最大值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
输入DF:
id . sub_id . id_created . id_last_modified sub_id_created . lead_
1 . 10 12:00 7:00 12:00 . 1:00
1 . 20 . 12:00 7:00 1:00 . 2:30
1 . 30 . 12:00 7:00 2:30 . 7:00
1 . 40 12:00 7:05 7:00 null
用例,我正在尝试创建一个 new_column时间",其中:
Use case, I am trying to create a new_column "time", where:
1. For: (id, max(sub_id)) : id_last_modified - sub_id_created
2. otherwise: sub_id_created - lead_
代码:
window = Window.partitionBy("id").orderBy("sub_id")
我得到了所有行的预期操作,除了以下组合:
I am getting the expected op for all the rows except for the combination of:
(id, max(sub_id))
为此我得到空
任何关于我哪里出错的建议都会有所帮助.谢谢.
Any suggestions on where am I going wrong will be helpful. Thanks.
推荐答案
猜猜这可能有用
df = df.withColumn("time",
when($"sub_id"===max($"sub_id").over(window),
(unix_timestamp($"id_last_modified")-
unix_timestamp($"sub_id_created"))/3600.0).otherwise(
(unix_timestamp($"sub_id_created") -
unix_timestamp(lead($"sub_id_created", 1).over(window)))/3600.0))
这篇关于窗口函数中的最大值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文