Python:如何使用大小相同的垃圾箱制作直方图 [英] Python: how to make an histogram with equally *sized* bins

查看:111
本文介绍了Python:如何使用大小相同的垃圾箱制作直方图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组数据,并且想要对其进行直方图处理.我需要这些垃圾箱具有相同的 size ,这意味着它们必须包含相同数量的对象,而不是更常见的(empy.histogram)等距间隔 /em>垃圾箱. 这自然会以垃圾箱宽度为代价,而垃圾箱宽度可能会有所不同,并且通常会有所不同.

I have a set of data, and want to make an histogram of it. I need the bins to have the same size, by which I mean that they must contain the same number of objects, rather than the more common (numpy.histogram) problem of having equally spaced bins. This will naturally come at the expenses of the bins widths, which can - and in general will - be different.

我将指定所需箱的数量和数据集,以获取箱边缘作为回报.

I will specify the number of desired bins and the data set, obtaining the bins edges in return.

Example:
data = numpy.array([1., 1.2, 1.3, 2.0, 2.1, 2.12])
bins_edges = somefunc(data, nbins=3)
print(bins_edges)
>> [1.,1.3,2.1,2.12]

因此,垃圾箱都包含2个点,但是它们的宽度(0.3、0.8、0.02)不同.

So the bins all contain 2 points, but their widths (0.3, 0.8, 0.02) are different.

有两个限制: -如果一组数据相同,则包含它们的bin可能会更大. -如果有N个数据并请求了M个仓,则将有N/M个仓,如果N%M不为0,则加1.

There are two limitations: - if a group of data is identical, the bin containing them could be bigger. - if there are N data and M bins are requested, there will be N/M bins plus one if N%M is not 0.

这段代码是我写的一些技巧,适用于小型数据集.如果我的积分超过10 ** 9 +并想加快流程怎么办?

This piece of code is some cruft I've written, which worked nicely for small data sets. What if I have 10**9+ points and want to speed up the process?

  1 import numpy as np
  2 
  3 def def_equbin(in_distr, binsize=None, bin_num=None):
  4 
  5     try:
  6 
  7         distr_size = len(in_distr)
  8 
  9         bin_size = distr_size / bin_num
 10         odd_bin_size = distr_size % bin_num
 11 
 12         args = in_distr.argsort()
 13 
 14         hist = np.zeros((bin_num, bin_size))
 15 
 16         for i in range(bin_num):
 17             hist[i, :] = in_distr[args[i * bin_size: (i + 1) * bin_size]]
 18 
 19         if odd_bin_size == 0:
 20             odd_bin = None
 21             bins_limits = np.arange(bin_num) * bin_size
 22             bins_limits = args[bins_limits]
 23             bins_limits = np.concatenate((in_distr[bins_limits],
 24                                           [in_distr[args[-1]]]))
 25         else:
 26             odd_bin = in_distr[args[bin_num * bin_size:]]
 27             bins_limits = np.arange(bin_num + 1) * bin_size
 28             bins_limits = args[bins_limits]
 29             bins_limits = in_distr[bins_limits]
 30             bins_limits = np.concatenate((bins_limits, [in_distr[args[-1]]]))
 31 
 32         return (hist, odd_bin, bins_limits)

推荐答案

以您的示例案例(2点的组合,总共6个数据点):

Using your example case (bins of 2 points, 6 total data points):

from scipy import stats
bin_edges = stats.mstats.mquantiles(data, [0, 2./6, 4./6, 1])
>> array([1. , 1.24666667, 2.05333333, 2.12])

这篇关于Python:如何使用大小相同的垃圾箱制作直方图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆