如何使用numpy.histogram计算概率,然后将其用于计算KL散度? [英] How to calculate probabilities using numpy.histogram and then use it for calculating KL divergence?

查看:812
本文介绍了如何使用numpy.histogram计算概率,然后将其用于计算KL散度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在下面的代码中,density=True返回每个仓位处的概率密度函数.现在如果必须计算P(x),我可以说hist显示概率吗?例如,如果第一个bin的平均值为0.5,我可以说在x = 0.5时,hist [0]的概率是多少?我必须使用使用P(x)的KL散度.

In the following code, the density=True returns probability density function at each bin. Now if have to calculate P(x), can I say that hist is showing probabilities? For example if the first bin's mean value is 0.5 can I say that at x=0.5 probability is hist[0] ? I have to use KL divergence which uses P(x).

x = np.array([0,0,0,0,0,3,3,2,2,2,1,1,1,1,])
hist,bin_edges= np.histogram(x,bins=10,density=True)

推荐答案

设置density=True时,NumPy返回一个概率密度函数(让我们说p).从理论上讲,p(0.5) = 0是因为概率定义为PDF曲线下的面积.您可以在此处上阅读有关它的更多详细信息.因此,如果要计算概率,则必须定义所需范围,并对该范围内的所有PDF值求和.

When you set density=True, NumPy returns a probability density function (lets say p). Theoretically speaking, p(0.5) = 0 because the probability is defined as the area under the PDF curve. You can read more details about it here. So, if you want to the compute probability you will have to define desired range and sum up all PDF values in this range.

对于KL,我可以共享我的相互信息计算解决方案(基本上是KL):

For the KL, I can share my solution for the mutual information computation (which is basically KL):

def mutual_information(x, y, sigma=1):
    bins = (256, 256)
    # histogram
    hist_xy = np.histogram2d(x, y, bins=bins)[0]

    # smooth it out for better results
    ndimage.gaussian_filter(hist_xy, sigma=sigma, mode='constant', output=hist_xy)

    # compute marginals
    hist_xy = hist_xy + EPS # prevent division with 0
    hist_xy = hist_xy / np.sum(hist_xy)
    hist_x = np.sum(hist_xy, axis=0)
    hist_y = np.sum(hist_xy, axis=1)

    # compute mi
    mi = (np.sum(hist_xy * np.log(hist_xy)) - np.sum(hist_x * np.log(hist_x)) - np.sum(hist_y * np.log(hist_y)))
    return mi

KL 可以这样计算(请注意,我没有对此进行测试!):

KL could be computed like this (please note that i did not test this!):

def kl(x, y, sigma=1):
    # histogram
    hist_xy = np.histogram2d(x, y, bins=bins)[0]

    # smooth it out for better results
    ndimage.gaussian_filter(hist_xy, sigma=sigma, mode='constant', output=hist_xy)

    # compute marginals
    hist_xy = hist_xy + EPS # prevent division with 0
    hist_xy = hist_xy / np.sum(hist_xy)
    hist_x = np.sum(hist_xy, axis=0)
    hist_y = np.sum(hist_xy, axis=1)

    kl = -np.sum(hist_x * np.log(hist_y / hist_x ))
    return kl

此外,为了获得最佳结果,您应该使用一些启发式方法来计算sigma,例如规则拇指带宽估计器.

Also, for the best result, you should compute sigma with some heuristics, for example A rule-of-thumb bandwidth estimator.

这篇关于如何使用numpy.histogram计算概率,然后将其用于计算KL散度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆