如何计算scala Dataframe中列的特定值的记录更改 [英] How to count record changes for a particular value of a column in a scala Dataframe

查看：61 发布时间：2021/4/8 19:44:14 scala apache-spark

本文介绍了如何计算scala Dataframe中列的特定值的记录更改的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在数据框中，各列的输入如下所示:

In a dataframe the columns have the input shown below:

    | id|  priority|         status|       datetime|data_as_of_Date|Amount|open_close|
    |  1|Unassigned|          Fixed| 10/8/2019 0:00| 2/12/2020 0:00|    40|    Closed|
    |  1|Unassigned|            New|2/12/2019 11:00| 2/12/2020 0:00|    20|      Open|
    |  1|Unassigned|Fix in progress|9/12/2019 11:00| 2/12/2020 0:00|    90|      Open|
    |  3|  Critical|        Removed|5/17/2019 12:00| 2/12/2020 0:00|    33|    Closed|
    |  3|Unassigned|Fix in progress|5/26/2019 10:00| 2/12/2020 0:00|    30|      Open|
    |  3|  Critical|            New|  5/8/2019 3:00| 2/12/2020 0:00|    34|      Open|
    |  3|Unassigned|          Fixed| 7/29/2019 7:00| 2/12/2020 0:00|    29|    Closed|

我该如何计算每个公司更改 open_close 列的次数?

How would I calculate the count of how many times the open_close column got changed per company?

推荐答案

您可以使用窗口函数使用日期列来添加行号.然后使用lag函数创建一个新列，该列向下移动一个位置，并且如果open_close值与前一个值不同，则放置'1'，否则放置'0'.最后，按公司ID分组并标记为1的总和更改.

You can use window functions to add row number using your date column. Then use lag function to create a new column that shifts down one position and if open_close value is different than the previous one puts '1' otherwise putting '0'. Finally, group by company id and sum changes marked as 1.

val df2 = df.withColumn("row_num",row_number.over(Window.orderBy('datetime).partitionBy('id)))
val df3 = df2.select('*,lag('open_close, 1, 0).over(Window.orderBy('row_num).partitionBy('id)).as("lag"))
val df4 = df3.select('*,when('open_close === 'lag || 'lag === 0 , 0).otherwise(1).as("change"))
df4.groupBy('id).agg(sum('change)).show()

+---+-----------+
| id|sum(change)|
+---+-----------+
|  1|          1|
|  3|          2|
+---+-----------+

这篇关于如何计算scala Dataframe中列的特定值的记录更改的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何计算scala Dataframe中列的特定值的记录更改 [英] How to count record changes for a particular value of a column in a scala Dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何计算scala Dataframe中列的特定值的记录更改 [英] How to count record changes for a particular value of a column in a scala Dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭