前一个计算行的Spark Dataframe访问 [英] Spark Dataframe access of previous calculated row

查看:69
本文介绍了前一个计算行的Spark Dataframe访问的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据:

+-----+-----+----+
|Col1 |t0   |t1  |
+-----+-----+----+
| A   |null |20  |
| A   |20   |40  |
| B   |null |10  |
| B   |10   |20  |
| B   |20   |120 |
| B   |120  |140 |
| B   |140  |320 |
| B   |320  |340 |
| B   |340  |360 |
+-----+-----+----+

我想要的是这样的:

+-----+-----+----+----+
|Col1 |t0   |t1  |grp |
+-----+-----+----+----+
| A   |null |20  |1A  |
| A   |20   |40  |1A  |
| B   |null |10  |1B  |
| B   |10   |20  |1B  |
| B   |20   |120 |2B  |
| B   |120  |140 |2B  |
| B   |140  |320 |3B  |
| B   |320  |340 |3B  |
| B   |340  |360 |3B  |
+-----+-----+----+----+

说明: 额外的列基于Col1以及t1和t0之间的差. 当两者之间的差异太大=>时,将生成一个新数字. (在上面的数据集中,当差异大于50时)

Explanation: The extra column is based on the Col1 and the difference between t1 and t0. When the difference between that two is too high => a new number is generated. (in the dataset above when the difference is greater than 50)

我使用以下命令构建t0:

I build t0 with:

val windowSpec = Window.partitionBy($"Col1").orderBy("t1")
df = df.withColumn("t0", lag("t1", 1) over windowSpec)

有人可以帮我怎么做吗? 我搜索了,但没有一个好主意. 我有点迷失了,因为我需要先前计算的grp行的值...

Can someone help me how to do it? I searched but didn't get a good idea. I'm a little bit lost because I need the value of the previous calculated row of grp...

谢谢

推荐答案

我自己解决了

val grp =  (coalesce(
      ($"t" - lag($"t", 1).over(windowSpec)),
      lit(0)
    ) > 50).cast("bigint")

df = df.withColumn("grp", sum(grp).over(windowSpec))

有了这个,我不再需要两个列(t0和t1),而只能使用t1(或t)而不计算t0.

With this I don't need both colums (t0 and t1) anymore but can use only t1 (or t) without compute t0.

(我只需要添加Col1的值,但最重要的部分是数字已完成并且可以正常工作.)

(I only need to add the value of Col1 but the most important part the number is done and works fine.)

我从以下途径得到了解决方案: 具有复杂条件的Spark SQL窗口函数

I got the solution from: Spark SQL window function with complex condition

感谢您的帮助

这篇关于前一个计算行的Spark Dataframe访问的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆