Matplotlib Xticks未与直方图对齐 [英] Matplotlib xticks not lining up with histogram

查看:325
本文介绍了Matplotlib Xticks未与直方图对齐的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用matplotlib生成一些直方图,并且在弄清楚如何获取直方图的xticks以使其与条形对齐方面遇到一些麻烦.

I'm generating some histograms with matplotlib and I'm having some trouble figuring out how to get the xticks of a histogram to align with the bars.

以下是我用来生成直方图的代码示例:

Here's a sample of the code I use to generate the histogram:

from matplotlib import pyplot as py

py.hist(histogram_data, 49, alpha=0.75)
py.title(column_name)
py.xticks(range(49))
py.show()

我知道histogram_data数组中的所有值都在[0,1,...,48]中.假设我数学运算正确,则意味着有49个唯一值.我想显示每个值的直方图.这是生成的图片.

I know that all of values in the histogram_data array are in [0,1,...,48]. Which, assuming I did the math right, means there are 49 unique values. I'd like to show a histogram of each of those values. Here's a picture of what's generated.

如何设置图形,使所有xtick对准每个条形的左侧,中间或右侧?

How can I set up the graph such that all of the xticks are aligned to the left, middle or right of each of the bars?

推荐答案

简短答案:使用plt.hist(data, bins=range(50))代替以获取左对齐的垃圾箱,使用plt.hist(data, bins=np.arange(50)-0.5)来获取以居中对齐的垃圾箱,依此类推

Short answer: Use plt.hist(data, bins=range(50)) instead to get left-aligned bins, plt.hist(data, bins=np.arange(50)-0.5) to get center-aligned bins, etc.

另外,如果性能很重要,因为您需要计数唯一的整数,因此我将在最后显示一些更有效的方法(np.bincount).

Also, if performance matters, because you want counts of unique integers, there are a couple of slightly more efficient methods (np.bincount) that I'll show at the end.

作为您所看到的独立示例,请考虑以下内容:

As a stand-alone example of what you're seeing, consider the following:

import matplotlib.pyplot as plt
import numpy as np

# Generate a random array of integers between 0-9
# data.min() will be 0 and data.max() will be 9 (not 10)
data = np.random.randint(0, 10, 1000)

plt.hist(data, bins=10)
plt.xticks(range(10))
plt.show()

您已经注意到,垃圾箱没有以整数间隔对齐.这基本上是因为您要求在 0和9之间的10个bin,这与要求10个唯一值的bin不太一样.

As you've noticed, the bins aren't aligned with integer intervals. This is basically because you asked for 10 bins between 0 and 9, which isn't quite the same as asking for bins for the 10 unique values.

所需的bin数量与唯一值的数量不完全相同.在这种情况下,您实际上应该手动指定垃圾箱边缘.

The number of bins you want isn't exactly the same as the number of unique values. What you actually should do in this case is manually specify the bin edges.

要解释发生了什么,让我们跳过matplotlib.pyplot.hist,而只使用基础的numpy.histogram函数.

To explain what's going on, let's skip matplotlib.pyplot.hist and just use the underlying numpy.histogram function.

例如,假设您具有值[0, 1, 2, 3].您的第一个本能是:

For example, let's say you have the values [0, 1, 2, 3]. Your first instinct would be to do:

In [1]: import numpy as np

In [2]: np.histogram([0, 1, 2, 3], bins=4)
Out[2]: (array([1, 1, 1, 1]), array([ 0.  ,  0.75,  1.5 ,  2.25,  3.  ]))

返回的第一个数组是计数,第二个数组是bin边(换句话说,条形边在您的绘图中).

The first array returned is the counts and the second is the bin edges (in other words, where bar edges would be in your plot).

请注意,我们获得了期望的计数,但是由于我们要求在数据的最小值和最大值之间有4个bin,因此bin的边缘不在整数值上.

Notice that we get the counts we'd expect, but because we asked for 4 bins between the min and max of the data, the bin edges aren't on integer values.

下一步,您可以尝试:

In [3]: np.histogram([0, 1, 2, 3], bins=3)
Out[3]: (array([1, 1, 2]), array([ 0.,  1.,  2.,  3.]))

请注意,bin边缘(第二个数组)是您所期望的,但计数不是.这是因为最后一个垃圾箱的行为与其他垃圾箱不同,如numpy.histogram文档中所述:

Note that the bin edges (the second array) are what you were expecting, but the counts aren't. That's because the last bin behaves differently than the others, as noted in the documentation for numpy.histogram:

Notes
-----
All but the last (righthand-most) bin is half-open.  In other words, if
`bins` is::

  [1, 2, 3, 4]

then the first bin is ``[1, 2)`` (including 1, but excluding 2) and the
second ``[2, 3)``.  The last bin, however, is ``[3, 4]``, which *includes*
4.

因此,您实际应该做的是准确指定所需的bin边缘,并在最后一个数据点之外添加一个,或将bin边缘移动到0.5间隔.例如:

Therefore, what you actually should do is specify exactly what bin edges you want, and either include one beyond your last data point or shift the bin edges to the 0.5 intervals. For example:

In [4]: np.histogram([0, 1, 2, 3], bins=range(5))
Out[4]: (array([1, 1, 1, 1]), array([0, 1, 2, 3, 4]))

bin对齐


现在让我们将其应用于第一个示例,看看它是什么样的:

Bin Alignment


Now let's apply this to the first example and see what it looks like:

import matplotlib.pyplot as plt
import numpy as np

# Generate a random array of integers between 0-9
# data.min() will be 0 and data.max() will be 9 (not 10)
data = np.random.randint(0, 10, 1000)

plt.hist(data, bins=range(11)) # <- The only difference
plt.xticks(range(10))
plt.show()

好的,太好了!但是,我们现在实际上具有左对齐的垃圾箱.如果我们希望居中对齐的垃圾箱更好地反映这些是唯一值的事实,怎么办?

Okay, great! However, we now effectively have left-aligned bins. What if we wanted center-aligned bins to better reflect the fact that these are unique values?

快速方法是仅移动垃圾箱边缘:

The quick way is to just shift the bin edges:

import matplotlib.pyplot as plt
import numpy as np

# Generate a random array of integers between 0-9
# data.min() will be 0 and data.max() will be 9 (not 10)
data = np.random.randint(0, 10, 1000)

bins = np.arange(11) - 0.5
plt.hist(data, bins)
plt.xticks(range(10))
plt.xlim([-1, 10])

plt.show()

类似地,对于右对齐的垃圾箱,只需移动-1.

Similarly for right-aligned bins, just shift by -1.

对于唯一整数值​​的特殊情况,我们可以采用另一种更有效的方法.

For the particular case of unique integer values, there's another, more efficient approach we can take.

如果要处理以0开头的唯一整数,最好使用numpy.bincount而不是numpy.hist.

If you're dealing with unique integer counts starting with 0, you're better off using numpy.bincount than using numpy.hist.

例如:

import matplotlib.pyplot as plt
import numpy as np

data = np.random.randint(0, 10, 1000)
counts = np.bincount(data)

# Switching to the OO-interface. You can do all of this with "plt" as well.
fig, ax = plt.subplots()
ax.bar(range(10), counts, width=1, align='center')
ax.set(xticks=range(10), xlim=[-1, 10])

plt.show()

此方法有两个很大的优点.一是速度. numpy.histogram(因此是plt.hist)基本上是通过numpy.digitize然后是numpy.bincount来运行数据的.因为您要处理唯一的整数值,所以无需执行numpy.digitize步骤.

There are two big advantages to this approach. One is speed. numpy.histogram (and therefore plt.hist) basically runs the data through numpy.digitize and then numpy.bincount. Because you're dealing with unique integer values, there's no need to take the numpy.digitize step.

但是,更大的优势是对显示的更多控制.如果您希望使用更薄的矩形,请使用较小的宽度:

However, the bigger advantage is more control over display. If you'd prefer thinner rectangles, just use a smaller width:

import matplotlib.pyplot as plt
import numpy as np

data = np.random.randint(0, 10, 1000)
counts = np.bincount(data)

# Switching to the OO-interface. You can do all of this with "plt" as well.
fig, ax = plt.subplots()
ax.bar(range(10), counts, width=0.8, align='center')
ax.set(xticks=range(10), xlim=[-1, 10])

plt.show()

这篇关于Matplotlib Xticks未与直方图对齐的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆