解释 scipy.stats.entropy 值 [英] Interpreting scipy.stats.entropy values

查看:84
本文介绍了解释 scipy.stats.entropy 值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 scipy.stats.entropy 来估计两个分布之间的 Kullback–Leibler (KL) 散度.更具体地说,我想使用 KL 作为度量来确定两个分布的一致性.

但是,我无法解释 KL 值.例如:

<块引用>

t1=numpy.random.normal(-2.5,0.1,1000)

t2=numpy.random.normal(-2.5,0.1,1000)

scipy.stats.entropy(t1,t2)

0.0015539217193737955

那么,

<块引用>

t1=numpy.random.normal(-2.5,0.1,1000)

t2=numpy.random.normal(2.5,0.1,1000)

scipy.stats.entropy(t1,t2)

= 0.0015908295787942181

基本没有重叠的完全不同的分布如何具有相同的KL值?

<块引用>

t1=numpy.random.normal(-2.5,0.1,1000)

t2=numpy.random.normal(25.,0.1,1000)

scipy.stats.entropy(t1,t2)

= 0.00081111364805590595

这个给出了更小的 KL 值(即距离),我倾向于将其解释为更一致".

关于如何在这种情况下解释 scipy.stats.entropy(即 KL 散度距离)的任何见解?

解决方案

numpy.random.normal(-2.5,0.1,1000) is a sample from an normal分配.它只是随机顺序的 1000 个数字.熵的文档 说:

<块引用>

pk[i] 是事件 i 的(可能未归一化的)概率.

因此,要获得有意义的结果,您需要将数字对齐",以便相同的索引对应于分布中的相同位置.在您的示例中,t1[0]t2[0] 没有关系.您的样本没有提供任何关于概率每个值的直接信息,而这正是 KL 散度所需要的;它只是为您提供了一些从分布中获取的实际值.

获得对齐值的最直接方法是在某些固定值集上评估分布的概率密度函数.要做到这一点,你需要使用 scipy.stats.norm(它产生一个可以以各种方式操作的分布对象)而不是 np.random.normal(它只返回采样值).举个例子:

t1 = stats.norm(-2.5, 0.1)t2 = stats.norm(-2.5, 0.1)t3 = stats.norm(-2.4, 0.1)t4 = stats.norm(-2.3, 0.1)# 用于评估 PDF 的域x = np.linspace(-5, 5, 100)

那么:

<预><代码>>>>stats.entropy(t1.pdf(x), t2.pdf(x))-0.0>>>stats.entropy(t1.pdf(x), t3.pdf(x))0.49999995020647586>>>stats.entropy(t1.pdf(x), t4.pdf(x))1.999999900414918

您可以看到,随着分布的距离越来越远,它们的 KL 散度会增加.(实际上,使用您的第二个示例将给出 inf 的 KL 散度,因为它们重叠很少.)

I am trying to use scipy.stats.entropy to estimate the Kullback–Leibler (KL) divergence between two distributions. More specifically, I would like to use the KL as a metric to decide how consistent two distributions are.

However, I cannot interpret the KL values. For ex:

t1=numpy.random.normal(-2.5,0.1,1000)

t2=numpy.random.normal(-2.5,0.1,1000)

scipy.stats.entropy(t1,t2)

0.0015539217193737955

Then,

t1=numpy.random.normal(-2.5,0.1,1000)

t2=numpy.random.normal(2.5,0.1,1000)

scipy.stats.entropy(t1,t2)

= 0.0015908295787942181

How can completely different distributions with essentially no overlap have the same KL value?

t1=numpy.random.normal(-2.5,0.1,1000)

t2=numpy.random.normal(25.,0.1,1000)

scipy.stats.entropy(t1,t2)

= 0.00081111364805590595

This one gives even a smaller KL value (i.e. distance), which I would be inclined to interpret as "more consistent".

Any insights on how to interpret the scipy.stats.entropy (i.e., KL divergence distance) in this context?

解决方案

numpy.random.normal(-2.5,0.1,1000) is a sample from a normal distribution. It's just 1000 numbers in a random order. The documentation for entropy says:

pk[i] is the (possibly unnormalized) probability of event i.

So to get a meaninful result, you need the numbers to be "aligned" so that the same indices correspond to the same positions in the distribution. In your example t1[0] has no relationship to t2[0]. Your sample doesn't provide any direct information about how probable each value is, which is what you need for the KL divergence; it just gives you some actual values that were taken from the distribution.

The most straightforward way to get aligned values is to evaluate the distribution's probability density function at some fixed set of values. To do this, you need to use scipy.stats.norm (which results a distribution object that can be manipulated in various ways) instead of np.random.normal (which only returns sampled values). Here's an example:

t1 = stats.norm(-2.5, 0.1)
t2 = stats.norm(-2.5, 0.1)
t3 = stats.norm(-2.4, 0.1)
t4 = stats.norm(-2.3, 0.1)

# domain to evaluate PDF on
x = np.linspace(-5, 5, 100)

Then:

>>> stats.entropy(t1.pdf(x), t2.pdf(x))
-0.0
>>> stats.entropy(t1.pdf(x), t3.pdf(x))
0.49999995020647586
>>> stats.entropy(t1.pdf(x), t4.pdf(x))
1.999999900414918

You can see that as the distributions move further apart, their KL divergence increases. (In fact, using your second example will give a KL divergence of inf because they overlap so little.)

这篇关于解释 scipy.stats.entropy 值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆