pandas 按两列分组,并进行均值汇总 [英] pandas groupby two columns and summarize by mean

查看:72
本文介绍了 pandas 按两列分组,并进行均值汇总的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的数据框:

I have a data frame like this:

df = pd.DataFrame()
df['id'] = [1,1,1,2,2,3,3,3,3,4,4,5]
df['view'] = ['A', 'B', 'A', 'A','B', 'A', 'B', 'A', 'A','B', 'A', 'B']
df['value'] = np.random.random(12)


    id view     value
0    1    A  0.625781
1    1    B  0.330084
2    1    A  0.024532
3    2    A  0.154651
4    2    B  0.196960
5    3    A  0.393941
6    3    B  0.607217
7    3    A  0.422823
8    3    A  0.994323
9    4    B  0.366650
10   4    A  0.649585
11   5    B  0.513923

我现在想通过值"为每个id总结每个view. 可以认为这是因为某些ID重复观察了一些视图,因此我想对它们进行总结.例如,id 1对A有两个观察值.

I now want to summarize for each id each view by mean of 'value'. Think of this as some ids have repeated observations for view, and I want to summarize them. For example, id 1 has two observations for A.

我尝试过

res = df.groupby(['id', 'view'])['value'].mean()

这实际上几乎是我想要的,但是pandas将idview列合并为一个,我不想要.

This actually almost what I want, but pandas combines the id and view column into one, which I do not want.

id  view
1   A       0.325157
    B       0.330084
2   A       0.154651
    B       0.196960
3   A       0.603696
    B       0.607217
4   A       0.649585
    B       0.366650
5   B       0.513923

res.shape的尺寸为(9,)

also res.shape is of dimension (9,)

我想要的输出是这样:

id  view    value
1   A       0.325157
1   B       0.330084
2   A       0.154651
2   B       0.196960
3   A       0.603696
3   B       0.607217
4   A       0.649585
4   B       0.366650
5   B       0.513923

保留列名和维的位置以及重复ID的位置.每个ID的A和B只能有1行.

where the column names and dimensions are kept and where the id is repeated. Each id should have only 1 row for A and B.

我该如何实现?

推荐答案

您需要 reset_index as_index=False rel ="noreferrer"> groupby ,因为您得到MuliIndex,并且默认情况下对索引的更高级别进行了 sparsified ,以使控制台输出看起来更容易一些:

You need reset_index or parameter as_index=False in groupby, because you get MuliIndex and by default the higher levels of the indexes are sparsified to make the console output a bit easier on the eyes:

np.random.seed(100)
df = pd.DataFrame()
df['id'] = [1,1,1,2,2,3,3,3,3,4,4,5]
df['view'] = ['A', 'B', 'A', 'A','B', 'A', 'B', 'A', 'A','B', 'A', 'B']
df['value'] = np.random.random(12)
print (df)
    id view     value
0    1    A  0.543405
1    1    B  0.278369
2    1    A  0.424518
3    2    A  0.844776
4    2    B  0.004719
5    3    A  0.121569
6    3    B  0.670749
7    3    A  0.825853
8    3    A  0.136707
9    4    B  0.575093
10   4    A  0.891322
11   5    B  0.209202

res = df.groupby(['id', 'view'])['value'].mean().reset_index()
print (res)
   id view     value
0   1    A  0.483961
1   1    B  0.278369
2   2    A  0.844776
3   2    B  0.004719
4   3    A  0.361376
5   3    B  0.670749
6   4    A  0.891322
7   4    B  0.575093
8   5    B  0.209202

res = df.groupby(['id', 'view'], as_index=False)['value'].mean()
print (res)
   id view     value
0   1    A  0.483961
1   1    B  0.278369
2   2    A  0.844776
3   2    B  0.004719
4   3    A  0.361376
5   3    B  0.670749
6   4    A  0.891322
7   4    B  0.575093
8   5    B  0.209202

这篇关于 pandas 按两列分组,并进行均值汇总的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆