对组对象中的不同项目应用不同的函数:Python pandas [英] Apply different functions to different items in group object: Python pandas

查看:28
本文介绍了对组对象中的不同项目应用不同的函数:Python pandas的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个如下的数据框:

Suppose I have a dataframe as follows:

In [1]: test_dup_df

Out[1]:
                  exe_price exe_vol flag 
2008-03-13 14:41:07  84.5    200     yes
2008-03-13 14:41:37  85.0    10000   yes
2008-03-13 14:41:38  84.5    69700   yes
2008-03-13 14:41:39  84.5    1200    yes
2008-03-13 14:42:00  84.5    1000    yes
2008-03-13 14:42:08  84.5    300     yes
2008-03-13 14:42:10  84.5    88100   yes
2008-03-13 14:42:10  84.5    11900   yes
2008-03-13 14:42:15  84.5    5000    yes
2008-03-13 14:42:16  84.5    3200    yes 

我想对 14:42:10 时间的重复数据进行分组,并对 exe_priceexe_vol 应用不同的函数(例如 sumexe_vol 并计算 exe_price 的成交量加权平均值).我知道我可以做到

I want to group a duplicate data at time 14:42:10 and apply different functions to exe_price and exe_vol (e.g., sum the exe_vol and compute volume weighted average of exe_price). I know that I can do

In [2]: grouped = test_dup_df.groupby(level=0)

对重复的索引进行分组,然后使用 first()last() 函数来获取第一行或最后一行,但这并不是我真正想要的想要.

to group the duplicate indices and then use the first() or last() functions to get either the first or the last row but this is not really what I want.

有没有办法对不同列中的值进行分组然后应用不同的(由我编写的)函数?

Is there a way to group and then apply different (written by me) functions to values in different column?

推荐答案

应用你自己的功能:

In [12]: def func(x):
             exe_price = (x['exe_price']*x['exe_vol']).sum() / x['exe_vol'].sum()
             exe_vol = x['exe_vol'].sum()
             flag = True        
             return Series([exe_price, exe_vol, flag], index=['exe_price', 'exe_vol', 'flag'])


In [13]: test_dup_df.groupby(test_dup_df.index).apply(func)
Out[13]:
                    exe_price exe_vol  flag
date_time                                  
2008-03-13 14:41:07      84.5     200  True 
2008-03-13 14:41:37        85   10000  True
2008-03-13 14:41:38      84.5   69700  True
2008-03-13 14:41:39      84.5    1200  True
2008-03-13 14:42:00      84.5    1000  True
2008-03-13 14:42:08      84.5     300  True
2008-03-13 14:42:10     20.71  100000  True
2008-03-13 14:42:15      84.5    5000  True
2008-03-13 14:42:16      84.5    3200  True

这篇关于对组对象中的不同项目应用不同的函数:Python pandas的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆