如何创建一个包含另一个数据框某些行的平均值的python数据框 [英] How to create a python dataframe containing the mean of some rows of another dataframe
问题描述
我有一个包含一些值的pandas DataFrame:
I have a pandas DataFrame containing some values:
id pair value subdir
taylor_1e3c_1s_56C taylor 6_13 -0.398716 run1
taylor_1e3c_1s_56C taylor 6_13 -0.397820 run2
taylor_1e3c_1s_56C taylor 6_13 -0.397310 run3
taylor_1e3c_1s_56C taylor 6_13 -0.390520 run4
taylor_1e3c_1s_56C taylor 6_13 -0.377390 run5
taylor_1e3c_1s_56C taylor 8_11 -0.393604 run1
taylor_1e3c_1s_56C taylor 8_11 -0.392899 run2
taylor_1e3c_1s_56C taylor 8_11 -0.392473 run3
taylor_1e3c_1s_56C taylor 8_11 -0.389959 run4
taylor_1e3c_1s_56C taylor 8_11 -0.387946 run5
我想做的是隔离具有相同index
,id
和pair
的行,计算value
列的平均值,并将其全部放入新的数据帧中.因为我现在已经有效地平均了subdir
的所有可能值,所以也应该删除该列.所以输出应该看起来像这样
what I would like to do is to isolate the rows that have the same index
, id
, and pair
, compute the mean over the value
column, and put it all in a new dataframe. Because I have now effectively averaged over all the possible values of subdir
, that column should also be removed. So the output should look something like this
id pair value
taylor_1e3c_1s_56C taylor 6_13 -0.392351
taylor_1e3c_1s_56C taylor 8_11 -0.391376
我应该如何在大熊猫中做这件事?
How should I do it in pandas?
推荐答案
使用语法糖-汇总 mean
:
Use syntactic sugar - groupby
by Series
and indices and aggregate mean
:
df = df['value'].groupby([df.index, df['id'], df['pair']]).mean().reset_index(level=[1,2])
print (df)
id pair value
taylor_1e3c_1s_56C taylor 6_13 -0.392351
taylor_1e3c_1s_56C taylor 8_11 -0.391376
经典解决方案-首先 reset_index
从索引中选择列,然后 groupby
按列名称和聚合 mean
:
Classic solution - first reset_index
for column from indices and then groupby
by columns names and aggregate mean
:
df = df.reset_index().groupby(['index','id','pair'])['value'].mean().reset_index(level=[1,2])
print (df)
id pair value
index
taylor_1e3c_1s_56C taylor 6_13 -0.392351
taylor_1e3c_1s_56C taylor 8_11 -0.391376
详细信息:
print (df.reset_index())
index id pair value subdir
0 taylor_1e3c_1s_56C taylor 6_13 -0.398716 run1
1 taylor_1e3c_1s_56C taylor 6_13 -0.397820 run2
2 taylor_1e3c_1s_56C taylor 6_13 -0.397310 run3
3 taylor_1e3c_1s_56C taylor 6_13 -0.390520 run4
4 taylor_1e3c_1s_56C taylor 6_13 -0.377390 run5
5 taylor_1e3c_1s_56C taylor 8_11 -0.393604 run1
6 taylor_1e3c_1s_56C taylor 8_11 -0.392899 run2
7 taylor_1e3c_1s_56C taylor 8_11 -0.392473 run3
8 taylor_1e3c_1s_56C taylor 8_11 -0.389959 run4
9 taylor_1e3c_1s_56C taylor 8_11 -0.387946 run5
聚集mean
后与3 levels
一起获得MultiIndex
:
print (df.reset_index().groupby(['index','id','pair'])['value'].mean())
index id pair
taylor_1e3c_1s_56C taylor 6_13 -0.392351
8_11 -0.391376
Name: value, dtype: float64
必要的 reset_index
用于将第二级蚂蚁第三级转换为列:
So is necessesary reset_index
for convert second ant third level to columns:
print (df.reset_index()
.groupby(['index','id','pair'])['value']
.mean()
.reset_index(level=[1,2]))
id pair value
index
taylor_1e3c_1s_56C taylor 6_13 -0.392351
taylor_1e3c_1s_56C taylor 8_11 -0.391376
这篇关于如何创建一个包含另一个数据框某些行的平均值的python数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!