Matplotlib xticks 与直方图不对齐 [英] Matplotlib xticks not lining up with histogram

查看:74
本文介绍了Matplotlib xticks 与直方图不对齐的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 matplotlib 生成一些直方图,但在弄清楚如何让直方图的 xticks 与条形对齐时遇到了一些麻烦.

I'm generating some histograms with matplotlib and I'm having some trouble figuring out how to get the xticks of a histogram to align with the bars.

这是我用来生成直方图的代码示例:

Here's a sample of the code I use to generate the histogram:

from matplotlib import pyplot as py

py.hist(histogram_data, 49, alpha=0.75)
py.title(column_name)
py.xticks(range(49))
py.show()

我知道 histogram_data 数组中的所有值都在 [0,1,...,48] 中.假设我的数学计算正确,这意味着有 49 个唯一值.我想显示每个值的直方图.这是生成的图片.

I know that all of values in the histogram_data array are in [0,1,...,48]. Which, assuming I did the math right, means there are 49 unique values. I'd like to show a histogram of each of those values. Here's a picture of what's generated.

如何设置图表,使所有 xticks 都与每个条形的左侧、中间或右侧对齐?

How can I set up the graph such that all of the xticks are aligned to the left, middle or right of each of the bars?

推荐答案

简短回答: 使用 plt.hist(data, bins=range(50)) 代替得到左对齐的 bin,plt.hist(data, bins=np.arange(50)-0.5) 得到居中对齐的 bin,等等

Short answer: Use plt.hist(data, bins=range(50)) instead to get left-aligned bins, plt.hist(data, bins=np.arange(50)-0.5) to get center-aligned bins, etc.

此外,如果性能很重要,因为您需要唯一整数的计数,我将在最后展示一些更高效的方法 (np.bincount).

Also, if performance matters, because you want counts of unique integers, there are a couple of slightly more efficient methods (np.bincount) that I'll show at the end.

作为您所看到的独立示例,请考虑以下内容:

As a stand-alone example of what you're seeing, consider the following:

import matplotlib.pyplot as plt
import numpy as np

# Generate a random array of integers between 0-9
# data.min() will be 0 and data.max() will be 9 (not 10)
data = np.random.randint(0, 10, 1000)

plt.hist(data, bins=10)
plt.xticks(range(10))
plt.show()

正如您所注意到的,bin 没有以整数间隔对齐.这基本上是因为您要求 0 和 9 之间的 10 个 bin,这与要求 10 个唯一值的 bin 不同.

As you've noticed, the bins aren't aligned with integer intervals. This is basically because you asked for 10 bins between 0 and 9, which isn't quite the same as asking for bins for the 10 unique values.

您想要的 bin 数量与唯一值的数量并不完全相同.在这种情况下,您实际上应该做的是手动指定 bin 边缘.

The number of bins you want isn't exactly the same as the number of unique values. What you actually should do in this case is manually specify the bin edges.

为了解释发生了什么,让我们跳过 matplotlib.pyplot.hist 并使用底层的 numpy.histogram 函数.

To explain what's going on, let's skip matplotlib.pyplot.hist and just use the underlying numpy.histogram function.

例如,假设您有值 [0, 1, 2, 3].你的第一反应是:

For example, let's say you have the values [0, 1, 2, 3]. Your first instinct would be to do:

In [1]: import numpy as np

In [2]: np.histogram([0, 1, 2, 3], bins=4)
Out[2]: (array([1, 1, 1, 1]), array([ 0.  ,  0.75,  1.5 ,  2.25,  3.  ]))

返回的第一个数组是计数,第二个数组是 bin 边缘(换句话说,条形边缘在您的绘图中的位置).

The first array returned is the counts and the second is the bin edges (in other words, where bar edges would be in your plot).

请注意,我们得到了预期的计数,但因为我们要求数据的最小值和最大值之间有 4 个 bin,所以 bin 边缘不是整数值.

Notice that we get the counts we'd expect, but because we asked for 4 bins between the min and max of the data, the bin edges aren't on integer values.

接下来,您可以尝试:

In [3]: np.histogram([0, 1, 2, 3], bins=3)
Out[3]: (array([1, 1, 2]), array([ 0.,  1.,  2.,  3.]))

请注意,bin 边缘(第二个数组)是您所期望的,但计数不是.这是因为最后一个 bin 的行为与其他 bin 不同,如 numpy.histogram 的文档中所述:

Note that the bin edges (the second array) are what you were expecting, but the counts aren't. That's because the last bin behaves differently than the others, as noted in the documentation for numpy.histogram:

Notes
-----
All but the last (righthand-most) bin is half-open.  In other words, if
`bins` is::

  [1, 2, 3, 4]

then the first bin is ``[1, 2)`` (including 1, but excluding 2) and the
second ``[2, 3)``.  The last bin, however, is ``[3, 4]``, which *includes*
4.

因此,您实际上应该做的是准确指定您想要的 bin 边缘,或者包括超出最后一个数据点的边界,或者将 bin 边缘移动到 0.5 间隔.例如:

Therefore, what you actually should do is specify exactly what bin edges you want, and either include one beyond your last data point or shift the bin edges to the 0.5 intervals. For example:

In [4]: np.histogram([0, 1, 2, 3], bins=range(5))
Out[4]: (array([1, 1, 1, 1]), array([0, 1, 2, 3, 4]))

Bin 对齐

<小时>

现在让我们将其应用于第一个示例,看看它是什么样子:

Bin Alignment


Now let's apply this to the first example and see what it looks like:

import matplotlib.pyplot as plt
import numpy as np

# Generate a random array of integers between 0-9
# data.min() will be 0 and data.max() will be 9 (not 10)
data = np.random.randint(0, 10, 1000)

plt.hist(data, bins=range(11)) # <- The only difference
plt.xticks(range(10))
plt.show()

好的,太好了!然而,我们现在有效地拥有左对齐的 bin.如果我们希望居中对齐的 bin 更好地反映这些是唯一值的事实,该怎么办?

Okay, great! However, we now effectively have left-aligned bins. What if we wanted center-aligned bins to better reflect the fact that these are unique values?

快速的方法是移动 bin 边缘:

The quick way is to just shift the bin edges:

import matplotlib.pyplot as plt
import numpy as np

# Generate a random array of integers between 0-9
# data.min() will be 0 and data.max() will be 9 (not 10)
data = np.random.randint(0, 10, 1000)

bins = np.arange(11) - 0.5
plt.hist(data, bins)
plt.xticks(range(10))
plt.xlim([-1, 10])

plt.show()

类似于右对齐的 bin,只需移动 -1.

Similarly for right-aligned bins, just shift by -1.

对于唯一整数值​​的特殊情况,我们可以采用另一种更有效的方法.

For the particular case of unique integer values, there's another, more efficient approach we can take.

如果您要处理从 0 开始的唯一整数计数,那么使用 numpy.bincount 比使用 numpy.hist 更好.

If you're dealing with unique integer counts starting with 0, you're better off using numpy.bincount than using numpy.hist.

例如:

import matplotlib.pyplot as plt
import numpy as np

data = np.random.randint(0, 10, 1000)
counts = np.bincount(data)

# Switching to the OO-interface. You can do all of this with "plt" as well.
fig, ax = plt.subplots()
ax.bar(range(10), counts, width=1, align='center')
ax.set(xticks=range(10), xlim=[-1, 10])

plt.show()

这种方法有两大优势.一是速度.numpy.histogram(因此plt.hist)基本上通过numpy.digitize然后numpy.bincount运行数据>.由于您正在处理唯一的整数值,因此无需执行 numpy.digitize 步骤.

There are two big advantages to this approach. One is speed. numpy.histogram (and therefore plt.hist) basically runs the data through numpy.digitize and then numpy.bincount. Because you're dealing with unique integer values, there's no need to take the numpy.digitize step.

然而,更大的优势是对显示的更多控制.如果您更喜欢更细的矩形,只需使用更小的宽度:

However, the bigger advantage is more control over display. If you'd prefer thinner rectangles, just use a smaller width:

import matplotlib.pyplot as plt
import numpy as np

data = np.random.randint(0, 10, 1000)
counts = np.bincount(data)

# Switching to the OO-interface. You can do all of this with "plt" as well.
fig, ax = plt.subplots()
ax.bar(range(10), counts, width=0.8, align='center')
ax.set(xticks=range(10), xlim=[-1, 10])

plt.show()

这篇关于Matplotlib xticks 与直方图不对齐的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆