如何获得NumPy数组的描述性统计信息? [英] How can I get descriptive statistics of a NumPy array?

查看:694
本文介绍了如何获得NumPy数组的描述性统计信息?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用以下代码创建一个numpy-ndarray.该文件有9列.我明确输入了每一列:

I use the following code to create a numpy-ndarray. The file has 9 columns. I explicitly type each column:

dataset = np.genfromtxt("data.csv", delimiter=",",dtype=('|S1', float, float,float,float,float,float,float,int))

现在,我想为每列(最小,最大,标准差,均值,中位数等)获取一些描述性统计信息.难道没有一种简单的方法可以做到这一点吗?

Now I would like to get some descriptive statistics for each column (min, max, stdev, mean, median, etc.). Shouldn't there be an easy way to do this?

我尝试过:

from scipy import stats
stats.describe(dataset)

但这会返回错误:TypeError: cannot perform reduce with flexible type

如何获取创建的NumPy数组的描述性统计信息?

How can I get descriptive statistics of the created NumPy array?

推荐答案

这不是一个漂亮的解决方案,但可以完成工作.问题在于,通过指定多个dtypes,您实际上是在制作一个元组的一维数组(实际上是np.void),由于它包含多个不同的类型(包括int),因此无法用统计数据进行描述.字符串.

This is not a pretty solution, but it gets the job done. The problem is that by specifying multiple dtypes, you are essentially making a 1D-array of tuples (actually np.void), which cannot be described by stats as it includes multiple different types, incl. strings.

这可以通过两轮阅读或使用带有 read_csv .

This could be resolved by either reading it in two rounds, or using pandas with read_csv.

如果您决定坚持使用numpy:

import numpy as np
a = np.genfromtxt('sample.txt', delimiter=",",unpack=True,usecols=range(1,9))
s = np.genfromtxt('sample.txt', delimiter=",",unpack=True,usecols=0,dtype='|S1')

from scipy import stats
for arr in a: #do not need the loop at this point, but looks prettier
    print(stats.describe(arr))
#Output per print:
DescribeResult(nobs=6, minmax=(0.34999999999999998, 0.70999999999999996), mean=0.54500000000000004, variance=0.016599999999999997, skewness=-0.3049304880932534, kurtosis=-0.9943046886340534)

请注意,在此示例中,最终数组的dtype作为float,而不是int,但是可以容易地(如有必要)使用arr.astype(int)

Note that in this example the final array has dtype as float, not int, but can easily (if necessary) be converted to int using arr.astype(int)

这篇关于如何获得NumPy数组的描述性统计信息?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆