如何在Matplotlib直方图中选择垃圾箱 [英] How to choose bins in matplotlib histogram
问题描述
有人可以向我解释直方图中的箱"是什么(
在这里,我们要求在[-4,-3,-2 ... 3,4]位置上的仓边缘.
plt.hist(x,bins = range(-4,5))
您关于如何选择最佳"数量的垃圾箱的问题是一个有趣的问题,实际上,有关该主题的文献很多.已经提出了一些常用的经验法则(例如, Freedman-Diaconis Rule , St鱼规则,斯科特规则,平方根规则等),每个都有其优点和缺点.
如果您想对各种自动调整直方图规则进行良好的Python实现,则可以在最新版本的AstroPy软件包此处描述.这就像 plt.hist
一样工作,但允许您使用类似的语法,例如 hist(x,bins ='freedman')
通过上面提到的Freedman-Diaconis规则选择垃圾箱.
我个人最喜欢的是贝叶斯块"(Bayesian Blocks)( bins ="blocks"
),它可以解决不等箱宽的最佳装箱.您可以在此处阅读更多相关内容.
编辑,2017年4月:使用matplotlib 2.0版或更高版本以及numpy 1.11版或更高版本,您现在可以直接在matplotlib中指定自动确定的垃圾箱,方法是指定例如 bins ='auto'
.这使用了 Sturges 和 Freedman-Diaconis bin 选择的最大值.您可以在 numpy.histogram 中阅读有关选项的更多信息
文档.
Can someone explain to me what "bins" in histogram are (the matplotlib hist function)? And assuming I need to plot the probability density function of some data, how do the bins I choose influence that? and how do I choose them? (I already read about them in the matplotlib.pyplot.hist and the numpy.histogram libraries but I did not get the idea)
The bins
parameter tells you the number of bins that your data will be divided into. You can specify it as an integer or as a list of bin edges.
For example, here we ask for 20 bins:
import numpy as np
import matplotlib.pyplot as plt
x = np.random.randn(1000)
plt.hist(x, bins=20)
And here we ask for bin edges at the locations [-4, -3, -2... 3, 4].
plt.hist(x, bins=range(-4, 5))
Your question about how to choose the "best" number of bins is an interesting one, and there's actually a fairly vast literature on the subject. There are some commonly-used rules-of-thumb that have been proposed (e.g. the Freedman-Diaconis Rule, Sturges' Rule, Scott's Rule, the Square-root rule, etc.) each of which has its own strengths and weaknesses.
If you want a nice Python implementation of a variety of these auto-tuning histogram rules, you might check out the histogram functionality in the latest version of the AstroPy package, described here.
This works just like plt.hist
, but lets you use syntax like, e.g. hist(x, bins='freedman')
for choosing bins via the Freedman-Diaconis rule mentioned above.
My personal favorite is "Bayesian Blocks" (bins="blocks"
), which solves for optimal binning with unequal bin widths. You can read a bit more on that here.
Edit, April 2017: with matplotlib version 2.0 or later and numpy version 1.11 or later, you can now specify automatically-determined bins directly in matplotlib, by specifying, e.g. bins='auto'
. This uses the maximum of the Sturges and Freedman-Diaconis bin choice. You can read more about the options in the numpy.histogram
docs.
这篇关于如何在Matplotlib直方图中选择垃圾箱的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!