计算列上的groupby函数 [英] groupby function on a calculated column

查看:72
本文介绍了计算列上的groupby函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在加入多个数据框我正在通过将两个diff数据帧中的两列相乘并将其除以属于另一个数据帧的一列来计算输出.我得到分组序列表达式为空错误,并且no_order不是聚合函数代码有什么问题

I am joining multiple dataframes and I am calculating the output by multiplying two columns from two diff dataframes and dividing it with a column belonging to another dataframe. I get grouping sequence expression is empty error and no_order is not an aggregate function whats is wrong with the code

df = df1.join(df2,df2["Code"] == df1["Code"],how = 'left')\
.join(df3, df3["ID"] == df1["ID"],how = 'left')\
.join(df4, df4["ID"] == df1["ID"],how = 'left')\
.join(df5, df5["Scenario"] == df1["Status"],how='left')\
.withColumn("Country",when(df1.Ind == 1,"WI"))\
.withColumn("Country",when(df1.Ind == 0,"AA"))\
.withColumn("Year",when(df1.Year == "2020","2021"))\
.agg((sum(df5["amt"] * df1["cost"]))/df2["no_order"]).alias('output')
.groupby('Country','Year','output')

推荐答案

该错误显示df2 ["no_order"]应该具有某些聚合函数,例如,您用于df5 ["amt"]的总和* df1 ["cost"].还要将.groupby()移到.agg()之上.

the error shows you that df2["no_order"] should be withing some aggregation function, for example the sum which you are using for df5["amt"] * df1["cost"]. Also move .groupby() above .agg().

如果我正确理解了您要实现的目标,则代码应如下所示:

If I got correctly what you are trying to achieve, the code should look like:

df = df1\
.join(df2, on = 'Code', how = 'left')\
.join(df3, on = 'ID', how = 'left')\
.join(df4, on = 'ID', how = 'left')\
.join(df5, df5.Scenario == df1.Status, how='left')\
.withColumn('Country', when(df1.Ind == 1,"WI").when(df1.Ind == 0,"AA"))\
.withColumn('Year', when(df1.Year == "2020","2021"))\
.groupby('Country','Year')\
.agg(sum(df5["amt"] * df1["cost"] / df2["no_order"]).alias('output'))

这篇关于计算列上的groupby函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆