根据pyspark中的一些复杂逻辑来做一些列 [英] doing some of columns based on some complex logic in pyspark

查看：177 发布时间：2020/9/4 21:36:05 sql pandas apache-spark pyspark apache-spark-sql

本文介绍了根据pyspark中的一些复杂逻辑来做一些列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是所附图片中的问题:

Here is the question in the image attached:

表格:

Row Col1    Col2    Col3    Result
1   10       20      100    30
2   20       40      200    60
3   30       60       0     240
4   40       70       0     180
5   30       80      50     110
6   25       35       0      65
7   10       20      60      30

因此，结果列是根据以下规则计算的:

So result column is calculated based on the below rules:

如果col3> 0，则结果= col1 + col2
如果col 3 = 0，则结果= sum(col2)，直到col3> 0 + col1(其中col3> 0)

例如对于行= 3，结果= 60 + 70 + 80 + 30(来自第5行的col1，因为此处col3> 0)= 240 对于第4行，结果= 70 + 80 + 30(来自第5行的col1，因为此处col3> 0)= 180 对于其他人

for example for row =3, the result=60+70+80+30(from col1 from row 5 because here col3>0)=240 for row=4, the result=70+80+30(from col1 from row 5 because here col3>0)=180 similarly for others

推荐答案

此答案(正确地，我可能会添加)问题的原始版本.

This answers (correctly, I might add) the original version of the question.

在SQL中，您可以使用窗口函数来表达这一点.使用累积总和来定义组以及其他累积总和:

In SQL, you can express this using window functions. Use a cumulative sum to define the group and the an additional cumulative sum:

select t.*,
       (case when col3 <> 0 then col1 + col2
             else sum(col2 + case when col3 = 0 then col1 else 0 end) over (partition by grp order by row desc)
        end) as result
from (select t.*,
             sum(case when col3 <> 0 then 1 else 0 end) over (order by row desc) as grp
      from t
     ) t;

此处是db< fiddle(使用Postgres).

Here is a db<>fiddle (which uses Postgres).

注意:

您的描述说else逻辑应为:

else sum(col2) over (partition by grp order by row desc)

您的示例说:

else sum(col2 + col3) over (partition by grp order by row desc)

在我看来，这似乎是最合逻辑的:

And in my opinion, this seems most logical:

else sum(col1 + col2) over (partition by grp order by row desc)

这篇关于根据pyspark中的一些复杂逻辑来做一些列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

根据pyspark中的一些复杂逻辑来做一些列 [英] doing some of columns based on some complex logic in pyspark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

根据pyspark中的一些复杂逻辑来做一些列 [英] doing some of columns based on some complex logic in pyspark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭