如何创建一个包含另一个数据框某些行的平均值的python数据框 [英] How to create a python dataframe containing the mean of some rows of another dataframe

查看:98
本文介绍了如何创建一个包含另一个数据框某些行的平均值的python数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含一些值的pandas DataFrame:

I have a pandas DataFrame containing some values:

                        id  pair      value  subdir
taylor_1e3c_1s_56C  taylor  6_13  -0.398716    run1 
taylor_1e3c_1s_56C  taylor  6_13  -0.397820    run2 
taylor_1e3c_1s_56C  taylor  6_13  -0.397310    run3 
taylor_1e3c_1s_56C  taylor  6_13  -0.390520    run4 
taylor_1e3c_1s_56C  taylor  6_13  -0.377390    run5 
taylor_1e3c_1s_56C  taylor  8_11  -0.393604    run1
taylor_1e3c_1s_56C  taylor  8_11  -0.392899    run2
taylor_1e3c_1s_56C  taylor  8_11  -0.392473    run3
taylor_1e3c_1s_56C  taylor  8_11  -0.389959    run4
taylor_1e3c_1s_56C  taylor  8_11  -0.387946    run5

我想做的是隔离具有相同indexidpair的行,计算value列的平均值,并将其全部放入新的数据帧中.因为我现在已经有效地平均了subdir的所有可能值,所以也应该删除该列.所以输出应该看起来像这样

what I would like to do is to isolate the rows that have the same index, id, and pair, compute the mean over the value column, and put it all in a new dataframe. Because I have now effectively averaged over all the possible values of subdir, that column should also be removed. So the output should look something like this

                        id  pair      value
taylor_1e3c_1s_56C  taylor  6_13  -0.392351
taylor_1e3c_1s_56C  taylor  8_11  -0.391376

我应该如何在大熊猫中做这件事?

How should I do it in pandas?

推荐答案

使用语法糖-汇总 mean:

Use syntactic sugar - groupby by Series and indices and aggregate mean:

df = df['value'].groupby([df.index, df['id'], df['pair']]).mean().reset_index(level=[1,2])
print (df)
                        id  pair     value
taylor_1e3c_1s_56C  taylor  6_13 -0.392351
taylor_1e3c_1s_56C  taylor  8_11 -0.391376

经典解决方案-首先 reset_index 从索引中选择列,然后 groupby 按列名称和聚合 mean:

Classic solution - first reset_index for column from indices and then groupby by columns names and aggregate mean:

df = df.reset_index().groupby(['index','id','pair'])['value'].mean().reset_index(level=[1,2])
print (df)
                        id  pair     value
index                                     
taylor_1e3c_1s_56C  taylor  6_13 -0.392351
taylor_1e3c_1s_56C  taylor  8_11 -0.391376

详细信息:

print (df.reset_index())
                index      id  pair     value subdir
0  taylor_1e3c_1s_56C  taylor  6_13 -0.398716   run1
1  taylor_1e3c_1s_56C  taylor  6_13 -0.397820   run2
2  taylor_1e3c_1s_56C  taylor  6_13 -0.397310   run3
3  taylor_1e3c_1s_56C  taylor  6_13 -0.390520   run4
4  taylor_1e3c_1s_56C  taylor  6_13 -0.377390   run5
5  taylor_1e3c_1s_56C  taylor  8_11 -0.393604   run1
6  taylor_1e3c_1s_56C  taylor  8_11 -0.392899   run2
7  taylor_1e3c_1s_56C  taylor  8_11 -0.392473   run3
8  taylor_1e3c_1s_56C  taylor  8_11 -0.389959   run4
9  taylor_1e3c_1s_56C  taylor  8_11 -0.387946   run5

聚集mean后与3 levels一起获得MultiIndex:

print (df.reset_index().groupby(['index','id','pair'])['value'].mean())
index               id      pair
taylor_1e3c_1s_56C  taylor  6_13   -0.392351
                            8_11   -0.391376
Name: value, dtype: float64

必要的 reset_index 用于将第二级蚂蚁第三级转换为列:

So is necessesary reset_index for convert second ant third level to columns:

print (df.reset_index()
        .groupby(['index','id','pair'])['value']
        .mean()
        .reset_index(level=[1,2]))
                        id  pair     value
index                                     
taylor_1e3c_1s_56C  taylor  6_13 -0.392351
taylor_1e3c_1s_56C  taylor  8_11 -0.391376

这篇关于如何创建一个包含另一个数据框某些行的平均值的python数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆