计算dataframe中列的摘要统计信息 [英] Calculate summary statistics of columns in dataframe

查看：1501 发布时间：2017/2/24 19:29:27 python csv pandas dataframe

本文介绍了计算dataframe中列的摘要统计信息的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个以下表单的数据框架（例如）

I have a dataframe of the following form (for example)

shopper_num,is_martian,number_of_items,count_pineapples,birth_country,tranpsortation_method
1,FALSE,0,0,MX,
2,FALSE,1,0,MX,
3,FALSE,0,0,MX,
4,FALSE,22,0,MX,
5,FALSE,0,0,MX,
6,FALSE,0,0,MX,
7,FALSE,5,0,MX,
8,FALSE,0,0,MX,
9,FALSE,4,0,MX,
10,FALSE,2,0,MX,
11,FALSE,0,0,MX,
12,FALSE,13,0,MX,
13,FALSE,0,0,CA,
14,FALSE,0,0,US,

How can I use Pandas to calculate summary statistics of each column (column data types are variable, some columns have no information

然后返回以下形式的数据框：

And then return the a dataframe of the form:

columnname, max, min, median,

is_martian, NA, NA, FALSE

推荐答案

describe 可以给你你想要的一切，否则你可以使用groupby执行聚合和传递agg函数列表： http：//pandas.pydata .org / pandas-docs / stable / groupby.html＃applied-multiple-functions-at-once

In [43]:

df.describe()

Out[43]:

       shopper_num is_martian  number_of_items  count_pineapples
count      14.0000         14        14.000000                14
mean        7.5000          0         3.357143                 0
std         4.1833          0         6.452276                 0
min         1.0000      False         0.000000                 0
25%         4.2500          0         0.000000                 0
50%         7.5000          0         0.000000                 0
75%        10.7500          0         3.500000                 0
max        14.0000      False        22.000000                 0

[8 rows x 4 columns]

请注意，有些列不能被概括为没有逻辑方法来总结它们，例如包含字符串数据的列

Note that some columns cannot be summarised as there is no logical way to summarise them, for instance columns containing string data

如果你喜欢，你可以转换结果：

As you prefer you can transpose the result if you prefer:

In [47]:

df.describe().transpose()

Out[47]:

                 count      mean       std    min   25%  50%    75%    max
shopper_num         14       7.5    4.1833      1  4.25  7.5  10.75     14
is_martian          14         0         0  False     0    0      0  False
number_of_items     14  3.357143  6.452276      0     0    0    3.5     22
count_pineapples    14         0         0      0     0    0      0      0

[4 rows x 8 columns]

这篇关于计算dataframe中列的摘要统计信息的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

计算dataframe中列的摘要统计信息 [英] Calculate summary statistics of columns in dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

计算dataframe中列的摘要统计信息 [英] Calculate summary statistics of columns in dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭