将不同的功能应用于组对象中的不同项目:Python pandas [英] Apply different functions to different items in group object: Python pandas
问题描述
假设我有一个数据框,如下所示:
Suppose I have a dataframe as follows:
In [1]: test_dup_df
Out[1]:
exe_price exe_vol flag
2008-03-13 14:41:07 84.5 200 yes
2008-03-13 14:41:37 85.0 10000 yes
2008-03-13 14:41:38 84.5 69700 yes
2008-03-13 14:41:39 84.5 1200 yes
2008-03-13 14:42:00 84.5 1000 yes
2008-03-13 14:42:08 84.5 300 yes
2008-03-13 14:42:10 84.5 88100 yes
2008-03-13 14:42:10 84.5 11900 yes
2008-03-13 14:42:15 84.5 5000 yes
2008-03-13 14:42:16 84.5 3200 yes
我想在时间 14:42:10
上对重复数据进行分组,并将不同的函数应用于 exe_price
和 exe_vol
(例如,总和 exe_vol
,并计算 exe_price
的交易量加权平均值).我知道我能做到
I want to group a duplicate data at time 14:42:10
and apply different functions to exe_price
and exe_vol
(e.g., sum the exe_vol
and compute volume weighted average of exe_price
). I know that I can do
In [2]: grouped = test_dup_df.groupby(level=0)
将重复的索引分组,然后使用 first()
或 last()
函数获取第一行或最后一行,但这并不是我真正要的想要.
to group the duplicate indices and then use the first()
or last()
functions to get either the first or the last row but this is not really what I want.
是否可以对不同列中的值进行分组然后应用不同的(由我编写的)函数?
Is there a way to group and then apply different (written by me) functions to values in different column?
推荐答案
应用您自己的函数:
In [12]: def func(x):
exe_price = (x['exe_price']*x['exe_vol']).sum() / x['exe_vol'].sum()
exe_vol = x['exe_vol'].sum()
flag = True
return Series([exe_price, exe_vol, flag], index=['exe_price', 'exe_vol', 'flag'])
In [13]: test_dup_df.groupby(test_dup_df.index).apply(func)
Out[13]:
exe_price exe_vol flag
date_time
2008-03-13 14:41:07 84.5 200 True
2008-03-13 14:41:37 85 10000 True
2008-03-13 14:41:38 84.5 69700 True
2008-03-13 14:41:39 84.5 1200 True
2008-03-13 14:42:00 84.5 1000 True
2008-03-13 14:42:08 84.5 300 True
2008-03-13 14:42:10 20.71 100000 True
2008-03-13 14:42:15 84.5 5000 True
2008-03-13 14:42:16 84.5 3200 True
这篇关于将不同的功能应用于组对象中的不同项目:Python pandas的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!