pandas 数据框条件.mean()取决于特定列中的值 [英] Panda dataframe conditional .mean() depending on values in certain column

查看：111 发布时间：2020/5/8 0:51:06 python pandas conditional mean

本文介绍了 pandas 数据框条件.mean()取决于特定列中的值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试创建一个新列，该列返回同一df中现有列的值的平均值.但是，均值应基于其他三列中的分组来计算.

I'm trying to create a new column which returns the mean of values from an existing column in the same df. However the mean should be computed based on a grouping in three other columns.

Out[184]: 
   YEAR daytype hourtype  scenario  option_value    
0  2015     SAT     of_h         0      0.134499       
1  2015     SUN     of_h         1     63.019250      
2  2015     WD      of_h         2     52.113516       
3  2015     WD      pk_h         3     43.126513       
4  2015     SAT     of_h         4     56.431392

当"YEAR"，"daytype"和"hourtype"相似时，我基本上希望有一个新的列"mean"来计算期权价值"的平均值.

I basically would like to have a new column 'mean' which compute the mean of "option value", when "YEAR", "daytype", and "hourtype" are similar.

我尝试了以下方法，但没有成功...

I tried the following approach but without success ...

In [185]: o2['premium']=o2.groupby(['YEAR', 'daytype', 'hourtype'])['option_cf'].mean()

TypeError: incompatible index of inserted column with frame index

推荐答案

这是一种解决方法

In [19]: def cust_mean(grp):
   ....:     grp['mean'] = grp['option_value'].mean()
   ....:     return grp
   ....:

In [20]: o2.groupby(['YEAR', 'daytype', 'hourtype']).apply(cust_mean)
Out[20]:
   YEAR daytype hourtype  scenario  option_value       mean
0  2015     SAT     of_h         0      0.134499  28.282946
1  2015     SUN     of_h         1     63.019250  63.019250
2  2015      WD     of_h         2     52.113516  52.113516
3  2015      WD     pk_h         3     43.126513  43.126513
4  2015     SAT     of_h         4     56.431392  28.282946

那么，您的尝试出了什么问题?

So, what was going wrong with your attempt?

它返回的聚合形状与原始数据框不同.

It returns an aggregate with different shape from the original dataframe.

In [21]: o2.groupby(['YEAR', 'daytype', 'hourtype'])['option_value'].mean()
Out[21]:
YEAR  daytype  hourtype
2015  SAT      of_h        28.282946
      SUN      of_h        63.019250
      WD       of_h        52.113516
               pk_h        43.126513
Name: option_value, dtype: float64

或使用transform

In [1461]: o2['premium'] = (o2.groupby(['YEAR', 'daytype', 'hourtype'])['option_value']
                              .transform('mean'))

In [1462]: o2
Out[1462]:
   YEAR daytype hourtype  scenario  option_value    premium
0  2015     SAT     of_h         0      0.134499  28.282946
1  2015     SUN     of_h         1     63.019250  63.019250
2  2015      WD     of_h         2     52.113516  52.113516
3  2015      WD     pk_h         3     43.126513  43.126513
4  2015     SAT     of_h         4     56.431392  28.282946

这篇关于 pandas 数据框条件.mean()取决于特定列中的值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas 数据框条件.mean()取决于特定列中的值 [英] Panda dataframe conditional .mean() depending on values in certain column

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas 数据框条件.mean()取决于特定列中的值 [英] Panda dataframe conditional .mean() depending on values in certain column

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭