pandas 数据框:如何将describe()应用于每个组并添加到新列? [英] Pandas dataframe: how to apply describe() to each group and add to new columns?
问题描述
df:
name score
A 1
A 2
A 3
A 4
A 5
B 2
B 4
B 6
B 8
希望以以下形式获取以下新数据框:
Want to get the following new dataframe in the form of below:
name count mean std min 25% 50% 75% max
A 5 3 .. .. .. .. .. ..
B 4 5 .. .. .. .. .. ..
如何从df.describe()中提取信息并重新格式化? 谢谢
How to exctract the information from df.describe() and reformat it? Thanks
推荐答案
定义一些数据
In[1]:
import pandas as pd
import io
data = """
name score
A 1
A 2
A 3
A 4
A 5
B 2
B 4
B 6
B 8
"""
df = pd.read_csv(io.StringIO(data), delimiter='\s+')
print(df)
.
Out[1]:
name score
0 A 1
1 A 2
2 A 3
3 A 4
4 A 5
5 B 2
6 B 4
7 B 6
8 B 8
解决方案
解决此问题的一种好方法是使用生成器表达式(请参见脚注),以允许pd.DataFrame()
迭代groupby
的结果,并动态构建摘要统计数据框:
Solution
A nice approach to this problem uses a generator expression (see footnote) to allow pd.DataFrame()
to iterate over the results of groupby
, and construct the summary stats dataframe on the fly:
In[2]:
df2 = pd.DataFrame(group.describe().rename(columns={'score':name}).squeeze()
for name, group in df.groupby('name'))
print(df2)
.
Out[2]:
count mean std min 25% 50% 75% max
A 5 3 1.581139 1 2.0 3 4.0 5
B 4 5 2.581989 2 3.5 5 6.5 8
此处squeeze
函数正在压缩维度,以将单列组摘要统计Dataframe
转换为Series
.
Here the squeeze
function is squeezing out a dimension, to convert the one-column group summary stats Dataframe
into a Series
.
脚注:生成器表达式的形式为my_function(a) for a in iterator
,或者如果iterator
给我们返回两个元素的tuples
,如groupby
的情况:my_function(a,b) for a,b in iterator
Footnote: A generator expression has the form my_function(a) for a in iterator
, or if iterator
gives us back two-element tuples
, as in the case of groupby
: my_function(a,b) for a,b in iterator
这篇关于 pandas 数据框:如何将describe()应用于每个组并添加到新列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!