如何找到 scipy.stats.binned_statistic_dd() 返回的给定 bin 编号的 bin 边缘? [英] How to find bin edges of given bin number returned by scipy.stats.binned_statistic_dd()?

查看:55
本文介绍了如何找到 scipy.stats.binned_statistic_dd() 返回的给定 bin 编号的 bin 边缘?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 Nx3 阵列 mm.函数调用

I have a Nx3 array mm. The function call

c,edg,idx = scipy.stats.binned_statistic_dd(mm,[], statistic='count',bins=(30,20,10),rg=((3,5),(2,8),(4,6)))

返回idx,这是一个一维整数数组,表示mm的每个元素落入的bin,而edg是一个包含 bin 边缘的 3 个数组的列表

returns idx, which is a 1d array of ints that represents the bin in which each element of mm falls, and edg is a list of 3 arrays holding the bin edges

我需要的是找到给定 bin 的 bin 边缘,因为它是 idx 中的 binnumber.例如,给定 idx=[24,153,...,72] 我想找到 bin 153 的边缘,即 bin 在 edg 中的位置.当然我可以通过mm[153]找到bin 153中的元素,但不能找到边缘.

What I need is to find the bin edges of a given bin given it's binnumber in idx. For example, given idx=[24,153,...,72] I want to find the edges of say bin 153, i.e. where that bin falls in edg. Of course I can find the elements in bin 153 by mm[153], but not the edges.

我发布了这个 Nx3 案例只是为了清楚起见.实际上,我正在寻找 NxD 案例的解决方案.

I posted this Nx3 case just for clarity. In reality, I am looking for a solution to the NxD case.

推荐答案

首先熟悉 np.unravel_index.它将平面索引"(即 binnumber!)转换为坐标元组.您可以将平面索引视为 arr.ravel() 的索引,将坐标元组视为 arr 的索引.例如,如果在下图中我们将数字 0、1、2、3、4、5 视为 bin 编号:

It helps to first be familiar with np.unravel_index. It converts a "flat index" (i.e. binnumber!) to a tuple of coordinates. You can think of the flat index as the index into arr.ravel(), and the tuple of coordinates as the index into arr. For example, if in the diagram below we think of the numbers 0,1,2,3,4,5 as bin numbers:

   | 0 | 1 | 2 |
---+---+---+---|
 0 | 0 | 1 | 2 |
 1 | 3 | 4 | 5 |
   +---+---+---|

然后 np.unravel_index(4, (2,3))

In [65]: np.unravel_index(4, (2,3))
Out[65]: (1, 1)

等于 (1,1) 因为形状为 (2,3) 的数组中的第 4 个 bin 编号具有坐标 (1,1).

equals (1,1) because the 4th bin number in an array of shape (2,3) has coordinate (1,1).

那好吧.接下来,我们需要知道内部 scipy.stats.binned_statistic_dd 在给定的 bin 边上添加两条边来处理异常值:

Okay then. Next, we need to know that internally scipy.stats.binned_statistic_dd adds two edges to the given bin edges to handle outliers:

bin_edges = [np.r_[-np.inf, edge, np.inf] for edge in bin_edges]

所以对应于bin号的边坐标由

So the edge coordinates corresponding to the bin numbers is given by

edge_index = np.unravel_index(binnumber, [len(edge)-1 for edge in bin_edges])

(我们使用 len(edge)-1 因为数组轴的形状比边数.)

(We use len(edge)-1 because the shape of the array axis is one less than the number of edges.)

例如:

import itertools as IT
import numpy as np
import scipy.stats as stats

sample = np.array(list(IT.product(np.arange(5)-0.5, 
                                  np.arange(5)*10-5, 
                                  np.arange(5)*100-50)))
bins = [np.arange(4),
        np.arange(4)*10,
        np.arange(4)*100]

statistic, bin_edges, binnumber = stats.binned_statistic_dd(
    sample=sample, values=sample, statistic='count', 
    bins=bins, 
    range=[(0,100)]*3)

bin_edges = [np.r_[-np.inf, edge, np.inf] for edge in bin_edges]
edge_index = np.unravel_index(binnumber, [len(edge)-1 for edge in bin_edges])


for samp, idx in zip(sample, zip(*edge_index)):
    vert = [edge[i] for i, edge in zip(idx, bin_edges)]
    print('{} goes in bin with left-most corner: {}'.format(samp, vert))

收益

[ -0.5  -5.  -50. ] goes in bin with left-most corner: [-inf, -inf, -inf]
[ -0.5  -5.   50. ] goes in bin with left-most corner: [-inf, -inf, 0.0]
[  -0.5   -5.   150. ] goes in bin with left-most corner: [-inf, -inf, 100.0]
[  -0.5   -5.   250. ] goes in bin with left-most corner: [-inf, -inf, 200.0]
...

这篇关于如何找到 scipy.stats.binned_statistic_dd() 返回的给定 bin 编号的 bin 边缘?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆