pandas ,将多个列的多个函数应用于groupby对象 [英] pandas, apply multiple functions of multiple columns to groupby object
问题描述
我想将多个列的多个函数应用于groupby对象,这会导致新的 pandas.DataFrame
。
by_user = lasts.groupby('user')
elapsed_days = by_user.apply(lambda x:(x.elapsed_time * x.num_cores).sum()/ 86400)
running_days = by_user.apply(lambda x:(x.running_time * x.num_cores).sum ()/ 86400)
user_df = elapsed_days.to_frame('elapsed_days')。join(running_days.to_frame('running_days'))
结果为 user_df
为:
然而我怀疑有更好的方法,比如:
by_user.agg({'elapsed_days ':lambda x:(x.elapsed_time * x.num_cores).sum()/ 86400,
'running_days':lambda x:(x.run ning_time * x.num_cores).sum()/ 86400})
然而,这不起作用,因为AFAIK agg()
可用于 pandas.Series
。
我找到了这个问题和答案,但解决方案对我而言看起来相当难看,并且考虑到答案是将近四岁,现在可能会有更好的方法。
agg 或应用
,而是首先通过 mul
,然后 div
和上次使用 groupby
by index
with 汇总
sum
: <$ $ {$'$'''''''''''''''''''''' 50000,60000,90000],
'running_time':[30000,20000,30000,15000],
'num_cores':[7,8,9,4]})
print(lasts)
elapsed_time num_cores running_time user
0 40000 7 30000 a
1 50000 8 20000 s
2 60000 9 30000 d
3 90000 4 15000 d
by_user = lasts。 groupby('user')
elapsed_days = by_user.apply(lambda x:(x.elapsed_time * x.num_cores).sum()/ 86400)
print(elapsed_days)
running_days = by_user .apply(lambda x:(x.running_time * x.num_cores).sum()/ 86400)
user_df = elapsed_days.to_frame('elapsed_days')。加入(running_days.to_frame('running_days'))
print(user_df)
elapsed_days running_days
用户
a 3.240741 2.430556
d 10.416667 3.819444
s 4.629630 1.851852
lasts = lasts.set_index( '用户')
print(lasts [['elapsed_time','running_time']] .mul(lasts ['num_cores'],axis = 0)
.div(86400)
。 groupby(level = 0)
.sum())
elapsed_time running_time
user
a 3.240741 2.430556
d 10.416667 3.819444
s 4.629630 1.851852
I want to apply multiple functions of multiple columns to a groupby object which results in a new pandas.DataFrame
.
I know how to do it in seperate steps:
by_user = lasts.groupby('user')
elapsed_days = by_user.apply(lambda x: (x.elapsed_time * x.num_cores).sum() / 86400)
running_days = by_user.apply(lambda x: (x.running_time * x.num_cores).sum() / 86400)
user_df = elapsed_days.to_frame('elapsed_days').join(running_days.to_frame('running_days'))
Which results in user_df
being:
However I suspect that there is a better way, like:
by_user.agg({'elapsed_days': lambda x: (x.elapsed_time * x.num_cores).sum() / 86400,
'running_days': lambda x: (x.running_time * x.num_cores).sum() / 86400})
However, this doesn't work, because AFAIK agg()
works on pandas.Series
.
I did find this question and answer, but the solutions look rather ugly to me, and considering that the answer is nearly four years old, there might be a better way by now.
I think you can avoid agg
or apply
and rather first multiple by mul
, then div
and last use groupby
by index
with aggregating
sum
:
lasts = pd.DataFrame({'user':['a','s','d','d'],
'elapsed_time':[40000,50000,60000,90000],
'running_time':[30000,20000,30000,15000],
'num_cores':[7,8,9,4]})
print (lasts)
elapsed_time num_cores running_time user
0 40000 7 30000 a
1 50000 8 20000 s
2 60000 9 30000 d
3 90000 4 15000 d
by_user = lasts.groupby('user')
elapsed_days = by_user.apply(lambda x: (x.elapsed_time * x.num_cores).sum() / 86400)
print (elapsed_days)
running_days = by_user.apply(lambda x: (x.running_time * x.num_cores).sum() / 86400)
user_df = elapsed_days.to_frame('elapsed_days').join(running_days.to_frame('running_days'))
print (user_df)
elapsed_days running_days
user
a 3.240741 2.430556
d 10.416667 3.819444
s 4.629630 1.851852
lasts = lasts.set_index('user')
print (lasts[['elapsed_time','running_time']].mul(lasts['num_cores'], axis=0)
.div(86400)
.groupby(level=0)
.sum())
elapsed_time running_time
user
a 3.240741 2.430556
d 10.416667 3.819444
s 4.629630 1.851852
这篇关于 pandas ,将多个列的多个函数应用于groupby对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!