根据条件将Python Pandas的平均值添加到新列中 [英] Python Pandas average based on condition into new column
问题描述
我有一个熊猫数据框,其中包含以下数据:
I have a pandas dataframe containing the following data:
matchID server court speed
1 1 A 100
1 2 D 200
1 3 D 300
1 4 A 100
1 1 A 120
1 2 A 250
1 3 D 110
1 4 D 100
2 1 A 100
2 2 D 200
2 3 D 300
2 4 A 100
2 1 A 120
2 2 A 250
2 3 D 110
2 4 D 100
我想添加两个新列,其中包含基于两个条件的均值. meanSpeedCourtA13
列应包含servers
1和3的平均速度,其中court = A
.这将是蜜蜂(100 + 120) / 2 = 110
.第二列名为meanSpeedCourtD13
的列应包含servers
1和3的平均速度,其中court = D
.这将是(300 + 110) / 2 = 205
.
I would like to add two new columns containing the mean based on two conditions. The column meanSpeedCourtA13
shall contain the mean speed of servers
1 and 3 where court = A
. This would bee (100 + 120) / 2 = 110
. The second column named meanSpeedCourtD13
shall contain the mean speed of servers
1 and 3 where court = D
. This would be (300 + 110) / 2 = 205
.
请注意,应该对每个matchID
执行此操作,因此,还需要一个groupby.这意味着不能使用包含iloc()
的解决方案.
Please note that this should be done for each matchID
, hence, a groupby is also required. this means that solutions containing iloc()
cannot be used.
结果数据框应如下所示:
The resulting dataframe should look as follows:
matchID server court speed meanSpeedCourtA13 meanSpeedCourtD13
1 1 A 100 110 205
1 2 D 200 110 205
1 3 D 300 110 205
1 4 A 100 110 205
1 1 A 120 110 205
1 2 A 250 110 205
1 3 D 110 110 205
1 4 D 100 110 205
2 1 A 100 110 205
2 2 D 200 110 205
2 3 D 300 110 205
2 4 A 100 110 205
2 1 A 120 110 205
2 2 A 250 110 205
2 3 D 110 110 205
2 4 D 100 110 205
推荐答案
好吧,这有点复杂.通常,我会尝试使用transform进行一些操作,但是如果有人有以下方面的功能,我会感到很高兴:
Ok this got a bit more complicated. Normally I'd try something with transform but I'd be glad if someone had something better than the following:
使用groupby
并将df发送到func,其中 df.loc
,最后使用pd.concat
将数据帧重新粘合在一起:
Use groupby
and send df to func where df.loc
is used, lastly use pd.concat
to glue the dataframe together again:
import pandas as pd
data = {'matchID': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 2, 9: 2, 10: 2,
11: 2, 12: 2, 13: 2, 14: 2, 15: 2},
'court': {0: 'A', 1: 'D', 2: 'D', 3: 'A', 4: 'A', 5: 'A', 6: 'D', 7: 'D', 8: 'A',
9: 'D', 10: 'D', 11: 'A', 12: 'A', 13: 'A', 14: 'D', 15: 'D'},
'speed': {0: 100, 1: 200, 2: 300, 3: 100, 4: 120, 5: 250, 6: 110, 7: 100, 8: 100,
9: 200, 10: 300, 11: 100, 12: 120, 13: 250, 14: 110, 15: 100},
'server': {0: 1, 1: 2, 2: 3, 3: 4, 4: 1, 5: 2, 6: 3, 7: 4, 8: 1, 9: 2, 10: 3,
11: 4, 12: 1, 13: 2, 14: 3, 15: 4}}
df = pd.DataFrame(data)
def func(dfx):
dfx['meanSpeedCourtA13'],dfx['meanSpeedCourtD13'] = \
(dfx.loc[(dfx.server.isin((1,3))) & (dfx.court == 'A'),'speed'].mean(),
dfx.loc[(dfx.server.isin((1,3))) & (dfx.court == 'D'),'speed'].mean())
return dfx
newdf = pd.concat(func(dfx) for _, dfx in df.groupby('matchID'))
print(newdf)
返回
court matchID server speed meanSpeedCourtA13 meanSpeedCourtD13
0 A 1 1 100 110.00 205.00
1 D 1 2 200 110.00 205.00
2 D 1 3 300 110.00 205.00
3 A 1 4 100 110.00 205.00
4 A 1 1 120 110.00 205.00
5 A 1 2 250 110.00 205.00
6 D 1 3 110 110.00 205.00
7 D 1 4 100 110.00 205.00
8 A 2 1 100 110.00 205.00
9 D 2 2 200 110.00 205.00
10 D 2 3 300 110.00 205.00
11 A 2 4 100 110.00 205.00
12 A 2 1 120 110.00 205.00
13 A 2 2 250 110.00 205.00
14 D 2 3 110 110.00 205.00
15 D 2 4 100 110.00 205.00
这篇关于根据条件将Python Pandas的平均值添加到新列中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!