pandas 数据框:如何将describe()应用于每个组并添加到新列? [英] Pandas dataframe: how to apply describe() to each group and add to new columns?

查看：85 发布时间：2020/5/18 19:46:16 python numpy pandas dataframe

本文介绍了 pandas 数据框:如何将describe()应用于每个组并添加到新列?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

df:

希望以以下形式获取以下新数据框:

Want to get the following new dataframe in the form of below:

   name count mean std min 25% 50% 75% max
    A     5    3    .. ..  ..  ..  ..  ..
    B     4    5    .. ..  ..  ..  ..  ..

如何从df.describe()中提取信息并重新格式化? 谢谢

How to exctract the information from df.describe() and reformat it? Thanks

定义一些数据

In[1]:
import pandas as pd
import io

data = """
name score
A      1
A      2
A      3
A      4
A      5
B      2
B      4
B      6
B      8
    """

df = pd.read_csv(io.StringIO(data), delimiter='\s+')
print(df)

Out[1]:
  name  score
0    A      1
1    A      2
2    A      3
3    A      4
4    A      5
5    B      2
6    B      4
7    B      6
8    B      8

解决方案

解决此问题的一种好方法是使用生成器表达式(请参见脚注)，以允许pd.DataFrame()迭代groupby的结果，并动态构建摘要统计数据框:

Solution

A nice approach to this problem uses a generator expression (see footnote) to allow pd.DataFrame() to iterate over the results of groupby, and construct the summary stats dataframe on the fly:

In[2]:
df2 = pd.DataFrame(group.describe().rename(columns={'score':name}).squeeze()
                         for name, group in df.groupby('name'))

print(df2)

Out[2]:
   count  mean       std  min  25%  50%  75%  max
A      5     3  1.581139    1  2.0    3  4.0    5
B      4     5  2.581989    2  3.5    5  6.5    8

此处squeeze函数正在压缩维度，以将单列组摘要统计Dataframe转换为Series.

Here the squeeze function is squeezing out a dimension, to convert the one-column group summary stats Dataframe into a Series.

脚注:生成器表达式的形式为my_function(a) for a in iterator，或者如果iterator给我们返回两个元素的tuples，如groupby的情况:my_function(a,b) for a,b in iterator

Footnote: A generator expression has the form my_function(a) for a in iterator, or if iterator gives us back two-element tuples, as in the case of groupby: my_function(a,b) for a,b in iterator

这篇关于 pandas 数据框:如何将describe()应用于每个组并添加到新列?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas 数据框:如何将describe()应用于每个组并添加到新列? [英] Pandas dataframe: how to apply describe() to each group and add to new columns?

问题描述

推荐答案

定义一些数据

解决方案

Solution

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas 数据框:如何将describe()应用于每个组并添加到新列? [英] Pandas dataframe: how to apply describe() to each group and add to new columns?

问题描述

推荐答案

定义一些数据

解决方案

Solution

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭