使用pyspark进行条件聚合 [英] conditional aggregation using pyspark

查看：35 发布时间：2021/11/14 22:12:39 python apache-spark pyspark apache-spark-sql

本文介绍了使用pyspark进行条件聚合的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

将以下视为数据框

a        b  c   d   e  
africa  123 1   10  121.2
africa  123 1   10  321.98
africa  123 2   12  43.92
africa  124 2   12  43.92
usa     121 1   12  825.32
usa     121 1   12  89.78
usa     123 2   10  32.24
usa     123 5   21  43.92
canada  132 2   13  63.21
canada  132 2   13  89.23
canada  132 3   21  85.32
canada  131 3   10  43.92

现在我想使用数据帧将以下 case 语句转换为 PYSPARK 中的等效语句.

now I want to convert the below case statement to equivalent statement in PYSPARK using dataframes.

我们可以直接在case语句中使用hivecontex/sqlcontest nut寻找传统的pyspark nql查询

we can directly use this in case statement using hivecontex/sqlcontest nut looking for the traditional pyspark nql query

select 
case 
    when c <=10 then sum(e)
    when c between 10 and 20 then avg(e)
else 0.00 end 
from table 
group by a,b,c,d

问候安维什

推荐答案

您可以将 SQL 代码直接转换为 DataFrame 原语:

You can translate your SQL code directly into DataFrame primitives:

from pyspark.sql.functions import when, sum, avg, col

(df
    .groupBy("a", "b", "c", "d")  # group by a,b,c,d
    .agg(  # select 
        when(col("c") < 10, sum("e"))  #  when c <=10 then sum(e)
            .when(col("c").between(10 ,20), avg("c"))  # when c between 10 and 20 then avg(e)
            .otherwise(0))   # else 0.00

这篇关于使用pyspark进行条件聚合的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用pyspark进行条件聚合 [英] conditional aggregation using pyspark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用pyspark进行条件聚合 [英] conditional aggregation using pyspark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭