如何在直方图箱中获取数据 [英] How to get data in a histogram bin
问题描述
我想获取直方图bin中包含的数据列表.我正在使用numpy和Matplotlib.我知道如何遍历数据并检查垃圾箱边缘.但是,我想对2D直方图执行此操作,并且执行此操作的代码非常难看. numpy是否有任何使它更容易的构造?
I want to get a list of the data contained in a histogram bin. I am using numpy, and Matplotlib. I know how to traverse the data and check the bin edges. However, I want to do this for a 2D histogram and the code to do this is rather ugly. Does numpy have any constructs to make this easier?
对于一维情况,我可以使用searchsorted().但是逻辑并没有更好,我真的不想在不需要时对每个数据点进行二进制搜索.
For the 1D case, I can use searchsorted(). But the logic is not that much better, and I don’t really want to do a binary search on each data point when I don’t have to.
大多数讨厌的逻辑是由于bin边界区域.所有区域都有这样的边界:[左边缘,右边缘).最后一个垃圾箱除外,其区域如下:[左边缘,右边缘].
Most of the nasty logic is due to the bin boundary regions. All regions have boundaries like this: [left edge, right edge). Except the last bin, which has a region like this: [left edge, right edge].
以下是一维情况的一些示例代码:
Here is some sample code for the 1D case:
import numpy as np
data = [0, 0.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5, 3]
hist, edges = np.histogram(data, bins=3)
print 'data =', data
print 'histogram =', hist
print 'edges =', edges
getbin = 2 #0, 1, or 2
print '---'
print 'alg 1:'
#for i in range(len(data)):
for d in data:
if d >= edges[getbin]:
if (getbin == len(edges)-2) or d < edges[getbin+1]:
print 'found:', d
#end if
#end if
#end for
print '---'
print 'alg 2:'
for d in data:
val = np.searchsorted(edges, d, side='right')-1
if val == getbin or val == len(edges)-1:
print 'found:', d
#end if
#end for
以下是2D情况的一些示例代码:
Here is some sample code for the 2D case:
import numpy as np
xdata = [0, 1.5, 1.5, 2.5, 2.5, 2.5, \
0.5, 0.5, 0.5, 0.5, 1.5, 1.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, \
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 3]
ydata = [0, 5,5, 5, 5, 5, \
15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, \
25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 30]
xbins = 3
ybins = 3
hist2d, xedges, yedges = np.histogram2d(xdata, ydata, bins=(xbins, ybins))
print 'data2d =', zip(xdata, ydata)
print 'hist2d ='
print hist2d
print 'xedges =', xedges
print 'yedges =', yedges
getbin2d = 5 #0 through 8
print 'find data in bin #', getbin2d
xedge_i = getbin2d % xbins
yedge_i = int(getbin2d / xbins) #IMPORTANT: this is xbins
for x, y in zip(xdata, ydata):
# x and y left edges
if x >= xedges[xedge_i] and y >= yedges[yedge_i]:
#x right edge
if xedge_i == xbins-1 or x < xedges[xedge_i + 1]:
#y right edge
if yedge_i == ybins-1 or y < yedges[yedge_i + 1]:
print 'found:', x, y
#end if
#end if
#end if
#end for
是否有一种更清洁/更有效的方式来做到这一点?看来numpy对此有所帮助.
Is there a cleaner / more efficient way to do this? It seems like numpy would have something for this.
推荐答案
digitize
将为您提供直方图中每个值所属的bin的 index :
import numpy as NP
A = NP.random.randint(0, 10, 100)
bins = NP.array([0., 20., 40., 60., 80., 100.])
# d is an index array holding the bin id for each point in A
d = NP.digitize(A, bins)
这篇关于如何在直方图箱中获取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!