pandas “描述"未返回所有列的摘要 [英] Pandas 'describe' is not returning summary of all columns

查看:83
本文介绍了 pandas “描述"未返回所有列的摘要的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在数据帧上运行'describe()',并且只获取int列的摘要(pandas 14.0).

I am running 'describe()' on a dataframe and getting summaries of only int columns (pandas 14.0).

文档说,对于对象列,频率是最常见的值,并且将返回其他统计信息.有什么事吗(顺便返回错误消息)

The documentation says that for object columns frequency of most common value, and additional statistics would be returned. What could be wrong? (no error message is returned by the way)

我认为这是将函数设置为对数据帧中的混合列类型起作用的方式.尽管文档中没有提及它.

I think it's how the function is set to behave on mixed column types in a dataframe. Although the documentation fails to mention it.

示例代码:

df_test = pd.DataFrame({'$a':[1,2], '$b': [10,20]})
df_test.dtypes
df_test.describe()
df_test['$a'] = df_test['$a'].astype(str)
df_test.describe()
df_test['$a'].describe()
df_test['$b'].describe()

与此同时,我的丑陋工作也解决了:

def my_df_describe(df):
    objects = []
    numerics = []
    for c in df:
        if (df[c].dtype == object):
            objects.append(c)
        else:
            numerics.append(c)

    return df[numerics].describe(), df[objects].describe()

推荐答案

从pandas v15.0开始,使用参数

As of pandas v15.0, use the parameter, DataFrame.describe(include = 'all') to get a summary of all the columns when the dataframe has mixed column types. The default behavior is to only provide a summary for the numerical columns.

示例:

In[1]:

df = pd.DataFrame({'$a':['a', 'b', 'c', 'd', 'a'], '$b': np.arange(5)})
df.describe(include = 'all')

Out[1]:

        $a    $b
count   5   5.000000
unique  4   NaN
top     a   NaN
freq    2   NaN
mean    NaN 2.000000
std     NaN 1.581139
min     NaN 0.000000
25%     NaN 1.000000
50%     NaN 2.000000
75%     NaN 3.000000
max     NaN 4.000000

数字列将具有NaN,用于汇总与对象(字符串)有关的统计信息,反之亦然.

The numerical columns will have NaNs for summary statistics pertaining to objects (strings) and vice versa.

仅汇总数字或对象列

  1. 要仅在数字列上调用describe(),请使用describe(include = [np.number])
  2. 使用describe(include = ['O'])仅在对象(字符串)上调用describe().

  1. To call describe() on just the numerical columns use describe(include = [np.number])
  2. To call describe() on just the objects (strings) using describe(include = ['O']).

In[2]:

df.describe(include = [np.number])

Out[3]:

         $b
count   5.000000
mean    2.000000
std     1.581139
min     0.000000
25%     1.000000
50%     2.000000
75%     3.000000
max     4.000000

In[3]:

df.describe(include = ['O'])

Out[3]:

    $a
count   5
unique  4
top     a
freq    2

这篇关于 pandas “描述"未返回所有列的摘要的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆