Matplotlib直方图放错位置并丢失条形图 [英] Matplotlib histogram misplaced and missing bars

查看:102
本文介绍了Matplotlib直方图放错位置并丢失条形图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据文件很大,因此正在使用numpy直方图(与matplotlib中使用的直方图相同)来手动生成直方图并进行更新.但是,在绘制时,我觉得图形已经移动了.

I have large data files and thus am using numpy histogram (same as used in matplotlib) to manually generate histograms and update them. However, at plotting, I feel that the graph is shifted.

这是我用于批量手动创建和更新直方图的代码.请注意,所有直方图都共享相同的bin.

This is the code I use to manually create and update histograms in batches. Note that all histograms share the same bins.

temp = np.histogram(batch, bins=np.linspace(0, 40, 41))
hist += temp[0]

在我解析数据文件时,重复了上面的代码.例如,一个小的数据集将具有以下内容作为最终的直方图数据:

The code above is repeated as I parse the data files. For example, a small data set would have the following as the final histogram data:

[8190、666、278、145、113、83、52、48、45、44、45、29、28、45、29、15、16、10、17、7、15、6,10、7、3、5、7、4、2、3、0、1、0、0、0、0、0、0、0、29]

下面是绘图代码.

import matplotlib
matplotlib.use('agg')
import matplotlib.pyplot as plt
import numpy as np
plt.xticks(np.linspace(0, 1, 11))
plt.hist([i/40 for i in range(40)], bins=np.linspace(0, 1, 41), weights=scores, rwidth=0.7)
plt.yscale('log', nonposy='clip')

所得到的数字非常奇怪.它在[0.475,0.5)处没有显示任何柱形,我希望0.975区间(在[0.975,1.0]范围内)包含最后29个值.但是,相反,我看到该柱位于[0.950,0.975)位置.我认为这可能与使用bins和linspace有关,但是诱饵数组的大小和权重是相同的.

The resulting figure is quite strange. It shows no bar at [0.475, 0.5) and I expect the 0.975 bin which is range [0.975, 1.0] to include the last 29 values. However instead, I see that bar at the [0.950, 0.975) position. I thought this might have to do with using bins and linspace, but the size of the decoy array and weights are the same.

我从未见过这种行为.我还认为这是范围[[x,x + width)的方式,但是我对此没有任何疑问.

I'm never seen this kind of behavior. I also thought it would be the way the ranges are [ x, x+width), but I haven't had issues with this.

关于使用linspace的说明.它指定了边,因此40个箱由41个边指定.

A note on using linspace. It specifies edges, so 40 bins is specified by 41 edges.

In [2]: np.linspace(0,1,41)                                                     
Out[2]: 
array([0.   , 0.025, 0.05 , 0.075, 0.1  , 0.125, 0.15 , 0.175, 0.2  ,
       0.225, 0.25 , 0.275, 0.3  , 0.325, 0.35 , 0.375, 0.4  , 0.425,
       0.45 , 0.475, 0.5  , 0.525, 0.55 , 0.575, 0.6  , 0.625, 0.65 ,
       0.675, 0.7  , 0.725, 0.75 , 0.775, 0.8  , 0.825, 0.85 , 0.875,
       0.9  , 0.925, 0.95 , 0.975, 1.   ])

In [3]: len(np.linspace(0,1,41))                                                
Out[3]: 41

推荐答案

似乎您正在使用 plt.hist ,其想法是在每个bin中放入一个值,从而模拟条形图.由于x值正好落在bin边界上,由于舍入,它们可能最终出现在相邻bin中.可以通过将x值移动到bin宽度的一半来缓解这种情况.最简单的是直接绘制条形图.

It seems you're using plt.hist with the idea to put one value into each bin, so simulating a bar plot. As the x-values fall exactly on the bin bounds, due to rounding they might end up in the neighbor bin. That could be mitigated by moving the x-values half a bin width. The simplest is drawing the bars directly.

以下代码使用给定的数据创建一个条形图,每个条形图都位于其表示的区域的中心.作为检查,再次在条的末端进行测量并显示其高度.

The following code creates a bar plot with the given data, with each bar at the center of the region it represents. As a check, the bars are measured again at the end and their height displayed.

from  matplotlib.ticker import MultipleLocator
import matplotlib.pyplot as plt
import numpy as np

scores =[8190,666,278,145,113,83,52,48,45,44,45,29,28,45,29,15,16,10,17,7,15,6,10,7,3,5,7,4,2,3,0,1,0,0,0,0,0,0,0,29]
binbounds = np.linspace(0, 1, 41)
rwidth = 0.7
width = binbounds[1] - binbounds[0]
bars = plt.bar(binbounds[:-1] + width / 2, height=scores, width=width * rwidth, align='center')
plt.gca().xaxis.set_major_locator(MultipleLocator(0.1))
plt.gca().xaxis.set_minor_locator(MultipleLocator(0.05))
plt.yscale('log', nonposy='clip')
for rect in bars:
    x, y = rect.get_xy()
    w = rect.get_width()
    h = rect.get_height()
    plt.text(x + w / 2, h, f'{h}\n', ha='center', va='center')
plt.show()

PS:要查看原始直方图发生了什么,只需做一个没有权重的测试图即可:

PS: To see what's happening with the original histogram, just do a test plot without the weights:

plt.hist([i/40 for i in range(40)], bins=np.linspace(0, 1, 41), rwidth=1, ec='k')
plt.plot([i/40 for i in range(40)], [0.5] * 40, 'ro')
plt.xticks(np.linspace(0, 1, 11))

红点显示x值的位置.有的掉入正确的仓中,有的掉入邻居中,突然得到2个值.

A red dot shows where the x-values are. Some fall into the correct bin, some into the neighbor which suddenly gets 2 values.

要创建一个x值在每个bin中心的直方图:

To create a histogram with the x-values at the center of each bin:

plt.hist([i/40 + 1/80 for i in range(40)], bins=np.linspace(0, 1, 41), rwidth=1, ec='k')
plt.plot([i/40 + 1/80 for i in range(40)], [0.5] * 40, 'ro')
plt.xticks(np.linspace(0, 1, 11))
plt.yticks([0, 1])

这篇关于Matplotlib直方图放错位置并丢失条形图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆