pandas “描述"未返回所有列的摘要 [英] Pandas 'describe' is not returning summary of all columns
问题描述
我正在数据帧上运行'describe()',并且只获取int列的摘要(pandas 14.0).
I am running 'describe()' on a dataframe and getting summaries of only int columns (pandas 14.0).
文档说,对于对象列,频率是最常见的值,并且将返回其他统计信息.有什么事吗(顺便返回错误消息)
The documentation says that for object columns frequency of most common value, and additional statistics would be returned. What could be wrong? (no error message is returned by the way)
我认为这是将函数设置为对数据帧中的混合列类型起作用的方式.尽管文档中没有提及它.
I think it's how the function is set to behave on mixed column types in a dataframe. Although the documentation fails to mention it.
示例代码:
df_test = pd.DataFrame({'$a':[1,2], '$b': [10,20]})
df_test.dtypes
df_test.describe()
df_test['$a'] = df_test['$a'].astype(str)
df_test.describe()
df_test['$a'].describe()
df_test['$b'].describe()
与此同时,我的丑陋工作也解决了:
def my_df_describe(df):
objects = []
numerics = []
for c in df:
if (df[c].dtype == object):
objects.append(c)
else:
numerics.append(c)
return df[numerics].describe(), df[objects].describe()
推荐答案
As of pandas v15.0, use the parameter, DataFrame.describe(include = 'all')
to get a summary of all the columns when the dataframe has mixed column types. The default behavior is to only provide a summary for the numerical columns.
示例:
In[1]:
df = pd.DataFrame({'$a':['a', 'b', 'c', 'd', 'a'], '$b': np.arange(5)})
df.describe(include = 'all')
Out[1]:
$a $b
count 5 5.000000
unique 4 NaN
top a NaN
freq 2 NaN
mean NaN 2.000000
std NaN 1.581139
min NaN 0.000000
25% NaN 1.000000
50% NaN 2.000000
75% NaN 3.000000
max NaN 4.000000
数字列将具有NaN,用于汇总与对象(字符串)有关的统计信息,反之亦然.
The numerical columns will have NaNs for summary statistics pertaining to objects (strings) and vice versa.
仅汇总数字或对象列
- 要仅在数字列上调用
describe()
,请使用describe(include = [np.number])
-
使用
describe(include = ['O'])
仅在对象(字符串)上调用describe()
.
- To call
describe()
on just the numerical columns usedescribe(include = [np.number])
To call
describe()
on just the objects (strings) usingdescribe(include = ['O'])
.
In[2]:
df.describe(include = [np.number])
Out[3]:
$b
count 5.000000
mean 2.000000
std 1.581139
min 0.000000
25% 1.000000
50% 2.000000
75% 3.000000
max 4.000000
In[3]:
df.describe(include = ['O'])
Out[3]:
$a
count 5
unique 4
top a
freq 2
这篇关于 pandas “描述"未返回所有列的摘要的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!