pandas 描述-附加参数 [英] pandas describe by - additional parameters

查看:54
本文介绍了 pandas 描述-附加参数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看到熊猫库有一个Describe by函数,该函数返回一些有用的统计信息.但是,是否可以将其他行添加到输出中,例如标准偏差(.std)和中位数绝对偏差(.mad)或唯一值的计数?

I see that the pandas library has a Describe by function which returns some useful statistics. However, is there a way to add additional rows to the output such as standard deviation (.std) and median absolute deviation (.mad) or the count of unique values?

我得到了df.describe(),但是我不知道如何添加这些额外的摘要内容

I get df.describe() but I'm unable to find out how to add these additional summary things

推荐答案

默认的describe看起来像这样:

np.random.seed([3,1415])
df = pd.DataFrame(np.random.rand(100, 5), columns=list('ABCDE'))

df.describe()

                A           B           C           D           E
count  100.000000  100.000000  100.000000  100.000000  100.000000
mean     0.495871    0.472939    0.455570    0.503899    0.451341
std      0.303589    0.291968    0.294984    0.269936    0.284666
min      0.006453    0.001559    0.001068    0.015311    0.009526
25%      0.239379    0.219141    0.196251    0.294371    0.202956
50%      0.529596    0.456548    0.376558    0.532002    0.432936
75%      0.759452    0.739666    0.665563    0.730702    0.686793
max      0.999799    0.994510    0.997271    0.981551    0.979221

已更新为熊猫0.20
我会像下面这样制作自己的describe.显而易见,如何添加更多.

Updated for pandas 0.20
I'd make my own describe like below. It should be obvious how to add more.

def describe(df, stats):
    d = df.describe()
    return d.append(df.reindex_axis(d.columns, 1).agg(stats))

describe(df, ['skew', 'mad', 'kurt'])

                A           B           C           D           E
count  100.000000  100.000000  100.000000  100.000000  100.000000
mean     0.495871    0.472939    0.455570    0.503899    0.451341
std      0.303589    0.291968    0.294984    0.269936    0.284666
min      0.006453    0.001559    0.001068    0.015311    0.009526
25%      0.239379    0.219141    0.196251    0.294371    0.202956
50%      0.529596    0.456548    0.376558    0.532002    0.432936
75%      0.759452    0.739666    0.665563    0.730702    0.686793
max      0.999799    0.994510    0.997271    0.981551    0.979221
skew    -0.014942    0.048054    0.247244   -0.125151    0.066156
mad      0.267730    0.249968    0.254351    0.228558    0.242874
kurt    -1.323469   -1.223123   -1.095713   -1.083420   -1.148642

旧答案

def describe(df):
    return pd.concat([df.describe().T,
                      df.mad().rename('mad'),
                      df.skew().rename('skew'),
                      df.kurt().rename('kurt'),
                     ], axis=1).T

describe(df)

                A           B           C           D           E
count  100.000000  100.000000  100.000000  100.000000  100.000000
mean     0.495871    0.472939    0.455570    0.503899    0.451341
std      0.303589    0.291968    0.294984    0.269936    0.284666
min      0.006453    0.001559    0.001068    0.015311    0.009526
25%      0.239379    0.219141    0.196251    0.294371    0.202956
50%      0.529596    0.456548    0.376558    0.532002    0.432936
75%      0.759452    0.739666    0.665563    0.730702    0.686793
max      0.999799    0.994510    0.997271    0.981551    0.979221
mad      0.267730    0.249968    0.254351    0.228558    0.242874
skew    -0.014942    0.048054    0.247244   -0.125151    0.066156
kurt    -1.323469   -1.223123   -1.095713   -1.083420   -1.148642

这篇关于 pandas 描述-附加参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆