scipy.stats.binned_statistic_dd() 中的输出 [英] Output in scipy.stats.binned_statistic_dd()

查看:82
本文介绍了scipy.stats.binned_statistic_dd() 中的输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 scipy.stats.binned_statistic_dd 我终生无法弄清楚输出.这里有人有什么建议吗?

I am trying to use scipy.stats.binned_statistic_dd and I can't for the life of me figure out the outputs. Does anyone have any advice here?

看看这个简单的示例程序:

Look at this simple sample program:

import scipy
scipy.__version__
# '0.14.0'
import numpy as np
print scipy.stats.binned_statistic_dd([np.ones(10), np.ones(10)], np.arange(10), 'count', bins=3)
#(array([[  0.,   0.,   0.],
#       [  0.,  10.,   0.],
#       [  0.,   0.,   0.]]), 
# [array([ 0.5       ,  0.83333333,  1.16666667,  1.5       ]), 
# array([ 0.5       ,  0.83333333,  1.16666667,  1.5       ])], 
# array([12, 12, 12, 12, 12, 12, 12, 12, 12, 12]))

所以文档声称输出是:

statistic : ndarray, shape(nx1, nx2, nx3,...)在每个二维 bin 中选择统计量

statistic : ndarray, shape(nx1, nx2, nx3,...) The values of the selected statistic in each two-dimensional bin

edges : 列表ndarrays 描述 (nxi + 1) bin 边缘的 D 数组列表每个维度

edges : list of ndarrays A list of D arrays describing the (nxi + 1) bin edges for each dimension

binnumber : 1-D ndarray of int 这分配给每个观察一个整数,表示这个观察下降.数组的长度与值相同.

binnumber : 1-D ndarray of ints This assigns to each observation an integer that represents the bin in which this observation falls. Array has the same length as values.

在这个例子中,统计数据很有意义,我要求计数"并得到 10,在同一个 bin 中有 10 个元素.边缘也很有意义,要结束的数据是 2 维,我想要 3 个 bin,所以我得到了 4 个合理的边缘.

In the example the statistic makes good sence, I asked for the 'count' and got 10, there are 10 elements all in that same bin. Edges makes good sense too, the data to be over was a dimension 2 and I wanted 3 bins so I gotout 4 edges that are reasonable.

那么binnumber这个问题对我来说根本没有意义,array([12, 12, 12, 12, 12, 12, 12, 12, 12, 12]),有确实有 10 个数字与输入的数据长度相同,np.arange(10),但数字 12 根本没有意义.我错过了什么.12 不是将 bin 分解为多维数组的解散索引,因为每个维度有 3 个 bin,我可以看到最多为 9 的数字.12 告诉我什么?

Then the question the binnumber makes no sense to me at all, array([12, 12, 12, 12, 12, 12, 12, 12, 12, 12]), there are indeed 10 numbers the same length and the data inputted, np.arange(10), but number 12 makes no sense at all. What am I missing. 12 is not an unravel index over the bins turned into a multi D array, since there are 3 bins in each dimension I could see numbers up to 9. What is 12 telling me?

推荐答案

binnumbers 中的值是包含额外一组超出范围"的垃圾箱.

The values in binnumbers are an unraveled index of bins that include an extra set of "out of range" bins.

在这个例子中,

In [40]: hst, edges, bincounts = binned_statistic_dd([np.ones(10), np.ones(10)], None, 'count', bins=3)

In [41]: hst
Out[41]: 
array([[  0.,   0.,   0.],
       [  0.,  10.,   0.],
       [  0.,   0.,   0.]])

箱的编号如下:

  0  |  1  |  2  |  3  |  4
-----+-----+-----+-----+-----
  5  |  6  |  7  |  8  |  9
-----+-----+-----+-----+-----
 10  | 11  | 12  | 13  | 14 
-----+-----+-----+-----+-----
 15  | 16  | 17  | 18  | 19
-----+-----+-----+-----+-----
 20  | 21  | 22  | 23  | 24

hst 中不包含超出范围"的 bin;hst 中的数据对应的是 bin 编号6、7、8、11、12、13、16、17 和 18.这就是为什么 bincounts 中的所有值都是 12:

The "out of range" bins are not included in hst; the data in hst corresponds to bin numbers 6, 7, 8, 11, 12, 13, 16, 17 and 18. That's why all the values in bincounts are 12:

In [42]: bincounts
Out[42]: array([12, 12, 12, 12, 12, 12, 12, 12, 12, 12])

您可以使用 range 参数强制将计数放入外部 bin.例如,通过将坐标范围设置为 [2, 3] 和 [0, 0.5],因此第一个坐标在其范围的左边,第二个坐标中的所有值都是在它们范围的右侧,所有的点都在右上角的外箱中,即bin 索引 4:

You can use the range argument to force the counts into the outer bins. For example, by setting the ranges of the coordinates to be [2, 3] and [0, 0.5], so all the values in the first coordinate are left of their range and all the values in the second coordinate are to the right of their range, all the points end up in the upper right outer bin, which is bin index 4:

In [51]: binned_statistic_dd([np.ones(10), np.ones(10)], None, 'count', bins=3, range=[[2,3],[0,0.5]])
Out[51]: 
(array([[ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.]]),
 [array([ 2.        ,  2.33333333,  2.66666667,  3.        ]),
  array([ 0.        ,  0.16666667,  0.33333333,  0.5       ])],
 array([4, 4, 4, 4, 4, 4, 4, 4, 4, 4]))

这篇关于scipy.stats.binned_statistic_dd() 中的输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆