Matplotlib:如何将直方图转换为离散概率质量函数? [英] Matplotlib: How to convert a histogram to a discrete probability mass function?

查看:156
本文介绍了Matplotlib:如何将直方图转换为离散概率质量函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对matplotlib的hist()函数有疑问.

I have a question regarding the hist() function with matplotlib.

我正在编写代码以绘制直方图,其值的范围是0到1.例如:

I am writing a code to plot a histogram of data who's value varies from 0 to 1. For example:

values = [0.21, 0.51, 0.41, 0.21, 0.81, 0.99]

bins = np.arange(0, 1.1, 0.1)
a, b, c = plt.hist(values, bins=bins, normed=0)
plt.show()

上面的代码生成正确的直方图(由于信誉不佳,我无法发布图片).在频率方面,它看起来像:

The code above generates a correct histogram (I could not post an image since I do not have enough reputation). In terms of frequencies, it looks like:

[0 0 2 0 1 1 0 0 1 1]

我想将此输出转换为离散概率质量函数,即,对于上面的示例,我想获得以下频率值:

I would like to convert this output to a discrete probability mass function, i.e. for the above example, I would like to get a following frequency values:

[ 0.  0.  0.333333333  0.  0.166666667  0.166666667  0.  0.  0.166666667  0.166666667 ] # each item in the previous array divided by 6)

我认为我只需要将hist()函数中的参数更改为'normed = 1'.但是,我得到以下直方图频率:

I thought I simply need to change the parameter in the hist() function to 'normed=1'. However, I get the following histogram frequencies:

[ 0.  0.  3.33333333  0.  1.66666667  1.66666667  0.  0.  1.66666667  1.66666667 ]

这不是我期望的,我也不知道如何获得总和应为1.0的离散概率质量函数.在以下链接中提出了类似的问题(链接至该问题),但我认为问题没有解决.

This is not what I expect and I don't know how to get the discrete probability mass function who's sum should be 1.0. A similar question was asked in the following link (link to the question), but I do not think the question was resolved.

感谢您的帮助.

推荐答案

原因是norm=True给出了概率密度函数.在概率论中,概率密度函数或连续随机变量的密度描述该随机变量采用给定值的相对可能性.

The reason is norm=True gives the probability density function. In probability theory, a probability density function or density of a continuous random variable, describes the relative likelihood for this random variable to take on a given value.

让我们考虑一个非常简单的示例.

Let us consider a very simple example.

x=np.arange(0.1,1.1,0.1)
array([ 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])

# Bin size
bins = np.arange(0.05, 1.15, 0.1)
np.histogram(x,bins=bins,normed=1)[0]
[ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.]
np.histogram(x,bins=bins,normed=0)[0]/float(len(x))
[ 0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1]

# Change the bin size
bins = np.arange(0.05, 1.15, 0.2)
np.histogram(x,bins=bins,normed=1)[0]
[ 1.,  1.,  1.,  1.,  1.]
np.histogram(x,bins=bins,normed=0)[0]/float(len(x))
[ 0.2,  0.2,  0.2,  0.2,  0.2]

因此,如上所示,x位于[0.05-0.15][0.15-0.25]之间的概率为1/10,而如果将bin大小更改为0.2,则x位于0.2之间的概率c5>或[0.25-0.45]1/5.现在,这些实际概率值取决于容器大小,但是,概率密度与容器大小无关.因此,这是执行上述操作的唯一正确方法,否则将需要在每个图中陈述bin宽度.

As, you can see in the above, the probability that x will lie between [0.05-0.15] or [0.15-0.25] is 1/10 whereas if you change the bin size to 0.2 then the probability that it will lie between [0.05-0.25] or [0.25-0.45] is 1/5. Now these actual probability values are dependent on the bin-size, however, the probability density is independent of the bins size. Thus, this is the only proper way to do the above, otherwise one would need to state the bin-width in each of the plot.

因此,在您的情况下,如果您确实想绘制每个仓位上的概率值(而不是概率密度),则只需将每个直方图的频率除以总元素数即可.但是,我建议您不要这样做,除非您使用离散变量,并且每个垃圾箱都代表该变量的单个可能值.

So in your case if you really want to plot the probability value at each bin (and not the probability density) then you can simply divide the frequency of each histogram by the number of total elements. However, I would suggest you not to do this unless you are working with discrete variables and each of your bins represent a single possible value of this variable.

这篇关于Matplotlib:如何将直方图转换为离散概率质量函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆