计算dataframe中列的摘要统计信息 [英] Calculate summary statistics of columns in dataframe
本文介绍了计算dataframe中列的摘要统计信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个以下表单的数据框架(例如)
I have a dataframe of the following form (for example)
shopper_num,is_martian,number_of_items,count_pineapples,birth_country,tranpsortation_method
1,FALSE,0,0,MX,
2,FALSE,1,0,MX,
3,FALSE,0,0,MX,
4,FALSE,22,0,MX,
5,FALSE,0,0,MX,
6,FALSE,0,0,MX,
7,FALSE,5,0,MX,
8,FALSE,0,0,MX,
9,FALSE,4,0,MX,
10,FALSE,2,0,MX,
11,FALSE,0,0,MX,
12,FALSE,13,0,MX,
13,FALSE,0,0,CA,
14,FALSE,0,0,US,
How can I use Pandas to calculate summary statistics of each column (column data types are variable, some columns have no information
然后返回以下形式的数据框:
And then return the a dataframe of the form:
columnname, max, min, median,
is_martian, NA, NA, FALSE
推荐答案
describe
可以给你你想要的一切,否则你可以使用groupby执行聚合和传递agg函数列表: http://pandas.pydata .org / pandas-docs / stable / groupby.html#applied-multiple-functions-at-once
In [43]:
df.describe()
Out[43]:
shopper_num is_martian number_of_items count_pineapples
count 14.0000 14 14.000000 14
mean 7.5000 0 3.357143 0
std 4.1833 0 6.452276 0
min 1.0000 False 0.000000 0
25% 4.2500 0 0.000000 0
50% 7.5000 0 0.000000 0
75% 10.7500 0 3.500000 0
max 14.0000 False 22.000000 0
[8 rows x 4 columns]
请注意,有些列不能被概括为没有逻辑方法来总结它们,例如包含字符串数据的列
Note that some columns cannot be summarised as there is no logical way to summarise them, for instance columns containing string data
如果你喜欢,你可以转换结果:
As you prefer you can transpose the result if you prefer:
In [47]:
df.describe().transpose()
Out[47]:
count mean std min 25% 50% 75% max
shopper_num 14 7.5 4.1833 1 4.25 7.5 10.75 14
is_martian 14 0 0 False 0 0 0 False
number_of_items 14 3.357143 6.452276 0 0 0 3.5 22
count_pineapples 14 0 0 0 0 0 0 0
[4 rows x 8 columns]
这篇关于计算dataframe中列的摘要统计信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文