对组对象中的不同项目应用不同的函数:Python pandas [英] Apply different functions to different items in group object: Python pandas
问题描述
假设我有一个如下的数据框:
Suppose I have a dataframe as follows:
In [1]: test_dup_df
Out[1]:
exe_price exe_vol flag
2008-03-13 14:41:07 84.5 200 yes
2008-03-13 14:41:37 85.0 10000 yes
2008-03-13 14:41:38 84.5 69700 yes
2008-03-13 14:41:39 84.5 1200 yes
2008-03-13 14:42:00 84.5 1000 yes
2008-03-13 14:42:08 84.5 300 yes
2008-03-13 14:42:10 84.5 88100 yes
2008-03-13 14:42:10 84.5 11900 yes
2008-03-13 14:42:15 84.5 5000 yes
2008-03-13 14:42:16 84.5 3200 yes
我想对 14:42:10
时间的重复数据进行分组,并对 exe_price
和 exe_vol
应用不同的函数(例如 sumexe_vol
并计算 exe_price
的成交量加权平均值).我知道我可以做到
I want to group a duplicate data at time 14:42:10
and apply different functions to exe_price
and exe_vol
(e.g., sum the exe_vol
and compute volume weighted average of exe_price
). I know that I can do
In [2]: grouped = test_dup_df.groupby(level=0)
对重复的索引进行分组,然后使用 first()
或 last()
函数来获取第一行或最后一行,但这并不是我真正想要的想要.
to group the duplicate indices and then use the first()
or last()
functions to get either the first or the last row but this is not really what I want.
有没有办法对不同列中的值进行分组然后应用不同的(由我编写的)函数?
Is there a way to group and then apply different (written by me) functions to values in different column?
推荐答案
应用你自己的功能:
In [12]: def func(x):
exe_price = (x['exe_price']*x['exe_vol']).sum() / x['exe_vol'].sum()
exe_vol = x['exe_vol'].sum()
flag = True
return Series([exe_price, exe_vol, flag], index=['exe_price', 'exe_vol', 'flag'])
In [13]: test_dup_df.groupby(test_dup_df.index).apply(func)
Out[13]:
exe_price exe_vol flag
date_time
2008-03-13 14:41:07 84.5 200 True
2008-03-13 14:41:37 85 10000 True
2008-03-13 14:41:38 84.5 69700 True
2008-03-13 14:41:39 84.5 1200 True
2008-03-13 14:42:00 84.5 1000 True
2008-03-13 14:42:08 84.5 300 True
2008-03-13 14:42:10 20.71 100000 True
2008-03-13 14:42:15 84.5 5000 True
2008-03-13 14:42:16 84.5 3200 True
这篇关于对组对象中的不同项目应用不同的函数:Python pandas的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!