如何找到 scipy.stats.binned_statistic_dd() 返回的给定 bin 编号的 bin 边缘? [英] How to find bin edges of given bin number returned by scipy.stats.binned_statistic_dd()?
问题描述
我有一个 Nx3 阵列 mm.函数调用
I have a Nx3 array mm. The function call
c,edg,idx = scipy.stats.binned_statistic_dd(mm,[], statistic='count',bins=(30,20,10),rg=((3,5),(2,8),(4,6)))
返回idx,这是一个一维整数数组,表示mm的每个元素落入的bin,而edg是一个包含 bin 边缘的 3 个数组的列表
returns idx, which is a 1d array of ints that represents the bin in which each element of mm falls, and edg is a list of 3 arrays holding the bin edges
我需要的是找到给定 bin 的 bin 边缘,因为它是 idx 中的 binnumber.例如,给定 idx=[24,153,...,72] 我想找到 bin 153 的边缘,即 bin 在 edg 中的位置.当然我可以通过mm[153]找到bin 153中的元素,但不能找到边缘.
What I need is to find the bin edges of a given bin given it's binnumber in idx. For example, given idx=[24,153,...,72] I want to find the edges of say bin 153, i.e. where that bin falls in edg. Of course I can find the elements in bin 153 by mm[153], but not the edges.
我发布了这个 Nx3 案例只是为了清楚起见.实际上,我正在寻找 NxD 案例的解决方案.
I posted this Nx3 case just for clarity. In reality, I am looking for a solution to the NxD case.
推荐答案
首先熟悉 np.unravel_index.它将平面索引"(即 binnumber!)转换为坐标元组.您可以将平面索引视为 arr.ravel()
的索引,将坐标元组视为 arr
的索引.例如,如果在下图中我们将数字 0、1、2、3、4、5 视为 bin 编号:
It helps to first be familiar with np.unravel_index. It converts a "flat index" (i.e. binnumber!) to a tuple of coordinates. You can think of the flat index as the index into arr.ravel()
, and the tuple of coordinates as the index into arr
. For example, if in the diagram below we think of the numbers 0,1,2,3,4,5 as bin numbers:
| 0 | 1 | 2 |
---+---+---+---|
0 | 0 | 1 | 2 |
1 | 3 | 4 | 5 |
+---+---+---|
然后 np.unravel_index(4, (2,3))
In [65]: np.unravel_index(4, (2,3))
Out[65]: (1, 1)
等于 (1,1)
因为形状为 (2,3)
的数组中的第 4 个 bin 编号具有坐标 (1,1)代码>.
equals (1,1)
because the 4th bin number in an array of shape (2,3)
has coordinate (1,1)
.
那好吧.接下来,我们需要知道内部 scipy.stats.binned_statistic_dd
在给定的 bin 边上添加两条边来处理异常值:
Okay then. Next, we need to know that internally scipy.stats.binned_statistic_dd
adds two edges to the given bin edges to handle outliers:
bin_edges = [np.r_[-np.inf, edge, np.inf] for edge in bin_edges]
所以对应于bin号的边坐标由
So the edge coordinates corresponding to the bin numbers is given by
edge_index = np.unravel_index(binnumber, [len(edge)-1 for edge in bin_edges])
(我们使用 len(edge)-1
因为数组轴的形状比边数.)
(We use len(edge)-1
because the shape of the array axis is one less than the
number of edges.)
例如:
import itertools as IT
import numpy as np
import scipy.stats as stats
sample = np.array(list(IT.product(np.arange(5)-0.5,
np.arange(5)*10-5,
np.arange(5)*100-50)))
bins = [np.arange(4),
np.arange(4)*10,
np.arange(4)*100]
statistic, bin_edges, binnumber = stats.binned_statistic_dd(
sample=sample, values=sample, statistic='count',
bins=bins,
range=[(0,100)]*3)
bin_edges = [np.r_[-np.inf, edge, np.inf] for edge in bin_edges]
edge_index = np.unravel_index(binnumber, [len(edge)-1 for edge in bin_edges])
for samp, idx in zip(sample, zip(*edge_index)):
vert = [edge[i] for i, edge in zip(idx, bin_edges)]
print('{} goes in bin with left-most corner: {}'.format(samp, vert))
收益
[ -0.5 -5. -50. ] goes in bin with left-most corner: [-inf, -inf, -inf]
[ -0.5 -5. 50. ] goes in bin with left-most corner: [-inf, -inf, 0.0]
[ -0.5 -5. 150. ] goes in bin with left-most corner: [-inf, -inf, 100.0]
[ -0.5 -5. 250. ] goes in bin with left-most corner: [-inf, -inf, 200.0]
...
这篇关于如何找到 scipy.stats.binned_statistic_dd() 返回的给定 bin 编号的 bin 边缘?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!