pandas 按两列分组,并进行均值汇总 [英] pandas groupby two columns and summarize by mean
问题描述
我有一个像这样的数据框:
I have a data frame like this:
df = pd.DataFrame()
df['id'] = [1,1,1,2,2,3,3,3,3,4,4,5]
df['view'] = ['A', 'B', 'A', 'A','B', 'A', 'B', 'A', 'A','B', 'A', 'B']
df['value'] = np.random.random(12)
id view value
0 1 A 0.625781
1 1 B 0.330084
2 1 A 0.024532
3 2 A 0.154651
4 2 B 0.196960
5 3 A 0.393941
6 3 B 0.607217
7 3 A 0.422823
8 3 A 0.994323
9 4 B 0.366650
10 4 A 0.649585
11 5 B 0.513923
我现在想通过值"为每个id
总结每个view
.
可以认为这是因为某些ID重复观察了一些视图,因此我想对它们进行总结.例如,id 1对A有两个观察值.
I now want to summarize for each id
each view
by mean of 'value'.
Think of this as some ids have repeated observations for view, and I want to summarize them. For example, id 1 has two observations for A.
我尝试过
res = df.groupby(['id', 'view'])['value'].mean()
这实际上几乎是我想要的,但是pandas将id
和view
列合并为一个,我不想要.
This actually almost what I want, but pandas combines the id
and view
column into one, which I do not want.
id view
1 A 0.325157
B 0.330084
2 A 0.154651
B 0.196960
3 A 0.603696
B 0.607217
4 A 0.649585
B 0.366650
5 B 0.513923
res.shape的尺寸为(9,)
also res.shape is of dimension (9,)
我想要的输出是这样:
id view value
1 A 0.325157
1 B 0.330084
2 A 0.154651
2 B 0.196960
3 A 0.603696
3 B 0.607217
4 A 0.649585
4 B 0.366650
5 B 0.513923
保留列名和维的位置以及重复ID的位置.每个ID的A和B只能有1行.
where the column names and dimensions are kept and where the id is repeated. Each id should have only 1 row for A and B.
我该如何实现?
推荐答案
您需要 reset_index
或groupby
,因为您得到MuliIndex
,并且默认情况下对索引的更高级别进行了 sparsified ,以使控制台输出看起来更容易一些:
You need reset_index
or parameter as_index=False
in groupby
, because you get MuliIndex
and by default the higher levels of the indexes are sparsified to make the console output a bit easier on the eyes:
np.random.seed(100)
df = pd.DataFrame()
df['id'] = [1,1,1,2,2,3,3,3,3,4,4,5]
df['view'] = ['A', 'B', 'A', 'A','B', 'A', 'B', 'A', 'A','B', 'A', 'B']
df['value'] = np.random.random(12)
print (df)
id view value
0 1 A 0.543405
1 1 B 0.278369
2 1 A 0.424518
3 2 A 0.844776
4 2 B 0.004719
5 3 A 0.121569
6 3 B 0.670749
7 3 A 0.825853
8 3 A 0.136707
9 4 B 0.575093
10 4 A 0.891322
11 5 B 0.209202
res = df.groupby(['id', 'view'])['value'].mean().reset_index()
print (res)
id view value
0 1 A 0.483961
1 1 B 0.278369
2 2 A 0.844776
3 2 B 0.004719
4 3 A 0.361376
5 3 B 0.670749
6 4 A 0.891322
7 4 B 0.575093
8 5 B 0.209202
res = df.groupby(['id', 'view'], as_index=False)['value'].mean()
print (res)
id view value
0 1 A 0.483961
1 1 B 0.278369
2 2 A 0.844776
3 2 B 0.004719
4 3 A 0.361376
5 3 B 0.670749
6 4 A 0.891322
7 4 B 0.575093
8 5 B 0.209202
这篇关于 pandas 按两列分组,并进行均值汇总的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!