跨行对 Pandas 数据框进行分组 [英] Grouping Pandas dataframe across rows
本文介绍了跨行对 Pandas 数据框进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个这样的 csv:
client1,client2,client3,client4,client5,client6,amount,,,Comp1,,,,4.475000,,,Comp2,,,,16.305584,,,Comp3,,,,4.050000Comp2,Comp1,,Comp4,,,,21.000000,,,Comp4,,,,30.000000,Comp1,,Comp2,,,,5.137500,,,Comp3,,,,52.650000,,,Comp1,,,,2.650000Comp3,,,Comp3,,,,29.000000Comp5,,,Comp2,,,,20.809000Comp5,,,Comp2,,,,15.100000Comp5,,,Comp2,,,,52.404000
在将其读入 Pandas 数据帧 df 后,我想分两步进行聚合:
第一步:
首先,我将金额相加:
client1 client2 client3 client4 client5 client6 金额Comp1 7.125000Comp2 16.305584Comp3 56.700000Comp4 30.000000Comp1 Comp2 5.137500Comp2 Comp1 Comp4 21.000000Comp3 Comp3 29.000000Comp5 Comp2 88.313000
然后,我想按每个客户端名称进行聚合,这样如果像第 5 组一样涉及多个客户端,则 5.1375 必须在 Comp1 和 Comp2 之间平均分配.试过这样:
df.groupby(['client1','client2','client3','client4','client5','client6']).apply(lambda x: x['amount'].sum()/len(x) if x.any().nunique()>=1 else x['amount'].sum())客户端1 客户端2 客户端3 客户端4 客户端5 客户端6 00 Comp1 3.5625001 Comp2 16.3055842 Comp3 28.3500003 Comp4 30.0000004 Comp1 Comp2 5.1375005 Comp2 Comp1 Comp4 21.0000006 Comp3 Comp3 29.0000007 Comp5 Comp2 29.437667
预期输出为:
客户金额比较 1 4.475+21/3+5.1375/2+2.65 = 16.69375Comp2 16.305584+21/3+20.809/2+15.10/2+52.404/2 = 67.462084Comp3 4.05+52.65+29 = 85.7Comp4 21/3+30 = 37比较 5 20.809/2+15.10/2+52.404/2 = 44.1565
我尝试使用 sum(axis=0)
但没有用.
解决方案
我们可以在这里使用一点数学
cols = ['金额']# 将金额除以非空字段df['new'] = df['amount']/df.drop(cols,1).notnull().sum(1)#通过删除amount列,unstack并删除nans将索引设置为新的.x = df.drop(cols,1).set_index('new').unstack().dropna()#仅从数量和客户端创建数据框ndf = pd.DataFrame({'amount':x.index.droplevel(0).values,'clients':x.values})#Groupby 客户端并获取总和ndf.groupby('clients').sum()
输出:
<前>数量客户Comp1 16.360417Comp2 69.697501Comp3 85.700000Comp4 36.666667Comp5 44.156500I've a csv like this:
client1,client2,client3,client4,client5,client6,amount
,,,Comp1,,,4.475000
,,,Comp2,,,16.305584
,,,Comp3,,,4.050000
Comp2,Comp1,,Comp4,,,21.000000
,,,Comp4,,,30.000000
,Comp1,,Comp2,,,5.137500
,,,Comp3,,,52.650000
,,,Comp1,,,2.650000
Comp3,,,Comp3,,,29.000000
Comp5,,,Comp2,,,20.809000
Comp5,,,Comp2,,,15.100000
Comp5,,,Comp2,,,52.404000
After reading it into a pandas dataframe, df, I wanted to aggregate in two steps:
Step1:
First, I sum the amount:
client1 client2 client3 client4 client5 client6 amount
Comp1 7.125000
Comp2 16.305584
Comp3 56.700000
Comp4 30.000000
Comp1 Comp2 5.137500
Comp2 Comp1 Comp4 21.000000
Comp3 Comp3 29.000000
Comp5 Comp2 88.313000
Then, I want to aggregate by each client name such that if multiple clients are involved like in group 5, then 5.1375 must be split equally between Comp1 and Comp2. Tried it this way:
df.groupby(['client1','client2','client3','client4','client5','client6']).apply(lambda x: x['amount'].sum()/len(x) if x.any().nunique()>=1 else x['amount'].sum())
client1 client2 client3 client4 client5 client6 0
0 Comp1 3.562500
1 Comp2 16.305584
2 Comp3 28.350000
3 Comp4 30.000000
4 Comp1 Comp2 5.137500
5 Comp2 Comp1 Comp4 21.000000
6 Comp3 Comp3 29.000000
7 Comp5 Comp2 29.437667
Expected Output is:
Client Amount
Comp1 4.475+21/3+5.1375/2+2.65 = 16.69375
Comp2 16.305584+21/3+20.809/2+15.10/2+52.404/2 = 67.462084
Comp3 4.05+52.65+29 = 85.7
Comp4 21/3+30 = 37
Comp5 20.809/2+15.10/2+52.404/2 = 44.1565
I tried using sum(axis=0)
but of no use.
解决方案
We can use a bit a maths here
cols = ['amount']
# Divide the amount by non null fields
df['new'] = df['amount']/df.drop(cols,1).notnull().sum(1)
#Set the index as new by droping amount column, unstack and drop the nans.
x = df.drop(cols,1).set_index('new').unstack().dropna()
#Create dataframe just from amount and the clients
ndf = pd.DataFrame({'amount':x.index.droplevel(0).values,'clients':x.values})
#Groupby client and get the sum
ndf.groupby('clients').sum()
Output:
amount clients Comp1 16.360417 Comp2 69.697501 Comp3 85.700000 Comp4 36.666667 Comp5 44.156500
这篇关于跨行对 Pandas 数据框进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文