Matplotlib直方图未正确计算每个bin中的值数量 [英] Matplotlib histogram not counting correctly the number of values in each bin

查看:73
本文介绍了Matplotlib直方图未正确计算每个bin中的值数量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用matplotlib.pyplot.hist创建一个非常简单的直方图,而且似乎没有正确地计算每个bin中的值数量.这是我的代码:

 将numpy导入为np导入matplotlib.pyplot作为pltplt.hist([.2,.3,.5,.6],bins = np.arange(0,1.1,.1)) 

我将间隔[0,1]划分为宽度为.1的区间,所以我应该得到四个高度为1的条形图.但是输出图形仅包含两个高度为2的条形图:正在计算.3值作为[.2,.3)bin的一部分,类似地,它会将.6值作为[.5,.6)bin的一部分进行计数.我在Spyder和Google Colab上都尝试过.有人知道发生了什么吗?谢谢!

解决方案

问题是值仅落在垃圾箱的边界上.

I am trying to make a very simple histogram with matplotlib.pyplot.hist, and it seems not to be counting properly the number of values in each bin. Here is my code:

    import numpy as np
    import matplotlib.pyplot as plt
    plt.hist([.2,.3,.5,.6],bins=np.arange(0,1.1,.1))

I am dividing the interval [0,1] in bins of width .1, so I should get four bars of height 1. But the output figure consists of only two bars of height 2: it is counting the .3 value as part of the [.2,.3) bin and, similarly, it is counting the .6 value as part of the [.5,.6) bin. I have tried it both on Spyder and Google Colab. Anyone knows what's going on? Thanks!

解决方案

The problem is that the values fall just on the boundaries of the bins. Floating point rounding can put them in either the previous or the next bin. You need bin boundaries nicely in-between the data points. Note that matplotlib's histogram is primarily meant for continuous distributions where floating point rounding doesn't have such large effects.

Here is some code to illustrate what's happening in both situations:

import numpy as np
import matplotlib.pyplot as plt

data = [.2, .3, .5, .6]

fig, axes = plt.subplots(ncols=2, figsize=(12, 4))

for ax in axes:
    if ax == axes[0]:
        bins = np.arange(0, 1.1, .1)
        ax.set_title('data on bin boundaries')
    else:
        bins = np.arange(-0.05, 1.1, .1)
        ax.set_title('data between bin boundaries')
    values, bin_bounds, bars = ax.hist(data, bins=bins, alpha=0.3)

    ax.vlines(bin_bounds, 0, max(values), color='crimson', ls=':')
    ax.scatter(data, np.full_like(data, 0.5), color='lime', s=30)
    ax.set_ylim(0, 2.2)
    ax.set_yticks(range(3))
plt.show()

这篇关于Matplotlib直方图未正确计算每个bin中的值数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆