Matplotlib直方图,带有收集箱,可实现高价值 [英] Matplotlib histogram with collection bin for high values

查看:113
本文介绍了Matplotlib直方图,带有收集箱,可实现高价值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有值的数组,我想为其创建一个直方图.我主要对低端数字感兴趣,并希望将每个大于300的数字收集到一个垃圾箱中.此垃圾箱应具有与所有其他(相等宽度)垃圾箱相同的宽度.我该怎么办?

I have an array with values, and I want to create a histogram of it. I am mainly interested in the low end numbers, and want to collect every number above 300 in one bin. This bin should have the same width as all other (equally wide) bins. How can I do this?

注意:此问题与以下问题相关:定义Matplotlib直方图中的bin宽度/x轴比例

Note: this question is related to this question: Defining bin width/x-axis scale in Matplotlib histogram

这是我到目前为止尝试过的:

This is what I tried so far:

import matplotlib.pyplot as plt
import numpy as np

def plot_histogram_01():
    np.random.seed(1)
    values_A = np.random.choice(np.arange(600), size=200, replace=True).tolist()
    values_B = np.random.choice(np.arange(600), size=200, replace=True).tolist()

    bins = [0, 25, 50, 75, 100, 125, 150, 175, 200, 225, 250, 275, 300, 600]

    fig, ax = plt.subplots(figsize=(9, 5))
    _, bins, patches = plt.hist([values_A, values_B], normed=1,  # normed is deprecated and will be replaced by density
                                bins=bins,
                                color=['#3782CC', '#AFD5FA'],
                                label=['A', 'B'])

    xlabels = np.array(bins[1:], dtype='|S4')
    xlabels[-1] = '300+'

    N_labels = len(xlabels)
    plt.xlim([0, 600])
    plt.xticks(25 * np.arange(N_labels) + 12.5)
    ax.set_xticklabels(xlabels)

    plt.yticks([])
    plt.title('')
    plt.setp(patches, linewidth=0)
    plt.legend()

    fig.tight_layout()
    plt.savefig('my_plot_01.png')
    plt.close()

这是结果,看起来不太好:

This is the result, which does not look nice:

然后我更改了其中带有xlim的行:

I then changed the line with xlim in it:

plt.xlim([0, 325])

具有以下结果:

看起来或多或少是我想要的,但是现在看不到最后一个垃圾箱.我想不见哪个技巧来可视化最后一个宽度为25的垃圾箱?

It looks more or less as I want it, but the last bin is not visible now. Which trick am I missing to visualize this last bin with a width of 25?

推荐答案

Numpy具有处理此问题的便捷功能:hist调用中,只需将数组包装在np.clip调用中,就像这样

Numpy has a handy function for dealing with this: np.clip. Despite what the name may sound like, it doesn't remove values, it just limits them to the range you specify. Basically, it does Artem's "dirty hack" inline. You can leave the values as they are, but in the hist call, just wrap the array in an np.clip call, like so

plt.hist(np.clip(values_A, bins[0], bins[-1]), bins=bins)

这更好一些,原因有很多:

This is nicer for a number of reasons:

  1. 它的运行速度更快-至少对于大量元素而言. Numpy在C级别上进行工作.对python列表进行操作(如在Artem的列表理解中一样)对每个元素都有很多开销.基本上,如果您可以选择使用numpy,则应该使用.

  1. It's way faster — at least for large numbers of elements. Numpy does its work at the C level. Operating on python lists (as in Artem's list comprehension) has a lot of overhead for each element. Basically, if you ever have the option to use numpy, you should.

您可以在需要的地方正确执行操作,从而减少了在代码中犯错误的机会.

You do it right where it's needed, which reduces the chance of making mistakes in your code.

您无需保留第二个数组副本,从而减少了内存使用量(这一行内除外),并进一步减少了出错的机会.

You don't need to keep a second copy of the array hanging around, which reduces memory usage (except within this one line) and further reduces the chances of making mistakes.

使用bins[0], bins[-1]而不是对值进行硬编码会减少再次出错的机会,因为您可以更改定义了bins的bin.您无需记住在呼叫clip或其他任何地方进行更改.

Using bins[0], bins[-1] instead of hard-coding the values reduces the chances of making mistakes again, because you can change the bins just where bins was defined; you don't need to remember to change them in the call to clip or anywhere else.

因此要像在OP中一样将它们放在一起:

So to put it all together as in the OP:

import matplotlib.pyplot as plt
import numpy as np

def plot_histogram_01():
    np.random.seed(1)
    values_A = np.random.choice(np.arange(600), size=200, replace=True)
    values_B = np.random.choice(np.arange(600), size=200, replace=True)

    bins = np.arange(0,350,25)

    fig, ax = plt.subplots(figsize=(9, 5))
    _, bins, patches = plt.hist([np.clip(values_A, bins[0], bins[-1]),
                                 np.clip(values_B, bins[0], bins[-1])],
                                # normed=1,  # normed is deprecated; replace with density
                                density=True,
                                bins=bins, color=['#3782CC', '#AFD5FA'], label=['A', 'B'])

    xlabels = bins[1:].astype(str)
    xlabels[-1] += '+'

    N_labels = len(xlabels)
    plt.xlim([0, 325])
    plt.xticks(25 * np.arange(N_labels) + 12.5)
    ax.set_xticklabels(xlabels)

    plt.yticks([])
    plt.title('')
    plt.setp(patches, linewidth=0)
    plt.legend(loc='upper left')

    fig.tight_layout()
plot_histogram_01()

这篇关于Matplotlib直方图,带有收集箱,可实现高价值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆