如何在Matplotlib直方图中选择垃圾箱 [英] How to choose bins in matplotlib histogram

查看:35
本文介绍了如何在Matplotlib直方图中选择垃圾箱的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以向我解释直方图中的箱"是什么(

在这里,我们要求在[-4,-3,-2 ... 3,4]位置上的仓边缘.

  plt.hist(x,bins = range(-4,5))

您关于如何选择最佳"数量的垃圾箱的问题是一个有趣的问题,实际上,有关该主题的文献很多.已经提出了一些常用的经验法则(例如, Freedman-Diaconis Rule St鱼规则,斯科特规则,平方根规则等),每个都有其优点和缺点.

如果您想对各种自动调整直方图规则进行良好的Python实现,则可以在最新版本的AstroPy软件包 bins ="blocks" ),它可以解决不等箱宽的最佳装箱.您可以在此处阅读更多相关内容.

<小时>

编辑,2017年4月:使用matplotlib 2.0版或更高版本以及numpy 1.11版或更高版本,您现在可以直接在matplotlib中指定自动确定的垃圾箱,方法是指定例如 bins ='auto'.这使用了 Sturges 和 Freedman-Diaconis bin 选择的最大值.您可以在 numpy.histogram 中阅读有关选项的更多信息 文档.

Can someone explain to me what "bins" in histogram are (the matplotlib hist function)? And assuming I need to plot the probability density function of some data, how do the bins I choose influence that? and how do I choose them? (I already read about them in the matplotlib.pyplot.hist and the numpy.histogram libraries but I did not get the idea)

解决方案

The bins parameter tells you the number of bins that your data will be divided into. You can specify it as an integer or as a list of bin edges.

For example, here we ask for 20 bins:

import numpy as np
import matplotlib.pyplot as plt

x = np.random.randn(1000)
plt.hist(x, bins=20)

And here we ask for bin edges at the locations [-4, -3, -2... 3, 4].

plt.hist(x, bins=range(-4, 5))

Your question about how to choose the "best" number of bins is an interesting one, and there's actually a fairly vast literature on the subject. There are some commonly-used rules-of-thumb that have been proposed (e.g. the Freedman-Diaconis Rule, Sturges' Rule, Scott's Rule, the Square-root rule, etc.) each of which has its own strengths and weaknesses.

If you want a nice Python implementation of a variety of these auto-tuning histogram rules, you might check out the histogram functionality in the latest version of the AstroPy package, described here. This works just like plt.hist, but lets you use syntax like, e.g. hist(x, bins='freedman') for choosing bins via the Freedman-Diaconis rule mentioned above.

My personal favorite is "Bayesian Blocks" (bins="blocks"), which solves for optimal binning with unequal bin widths. You can read a bit more on that here.


Edit, April 2017: with matplotlib version 2.0 or later and numpy version 1.11 or later, you can now specify automatically-determined bins directly in matplotlib, by specifying, e.g. bins='auto'. This uses the maximum of the Sturges and Freedman-Diaconis bin choice. You can read more about the options in the numpy.histogram docs.

这篇关于如何在Matplotlib直方图中选择垃圾箱的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆