詹森-香农散度 [英] Jensen-Shannon Divergence

查看:612
本文介绍了詹森-香农散度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我还有另一个问题,我希望有人可以帮助我.

I have another question that I was hoping someone could help me with.

我正在使用Jensen-Shannon-Divergence来测量两个概率分布之间的相似性.如果相似得分在1到0之间,并且使用1以2为底的对数,那么它们似乎是正确的,而0意味着分布是相等的.

I'm using the Jensen-Shannon-Divergence to measure the similarity between two probability distributions. The similarity scores appear to be correct in the sense that they fall between 1 and 0 given that one uses the base 2 logarithm, with 0 meaning that the distributions are equal.

但是,我不确定某个地方是否确实存在错误,并且想知道是否有人可以说是的,对的"或不,您做错了什么".

However, I'm not sure whether there is in fact an error somewhere and was wondering whether someone might be able to say 'yes it's correct' or 'no, you did something wrong'.

这是代码:

from numpy import zeros, array
from math import sqrt, log


class JSD(object):
    def __init__(self):
        self.log2 = log(2)


    def KL_divergence(self, p, q):
        """ Compute KL divergence of two vectors, K(p || q)."""
        return sum(p[x] * log((p[x]) / (q[x])) for x in range(len(p)) if p[x] != 0.0 or p[x] != 0)

    def Jensen_Shannon_divergence(self, p, q):
        """ Returns the Jensen-Shannon divergence. """
        self.JSD = 0.0
        weight = 0.5
        average = zeros(len(p)) #Average
        for x in range(len(p)):
            average[x] = weight * p[x] + (1 - weight) * q[x]
            self.JSD = (weight * self.KL_divergence(array(p), average)) + ((1 - weight) * self.KL_divergence(array(q), average))
        return 1-(self.JSD/sqrt(2 * self.log2))

if __name__ == '__main__':
    J = JSD()
    p = [1.0/10, 9.0/10, 0]
    q = [0, 1.0/10, 9.0/10]
    print J.Jensen_Shannon_divergence(p, q)

问题是,例如,当我比较两个文本文档时,我觉得分数不够高.但是,这纯粹是一种主观感觉.

The problem is that I feel that the scores are not high enough when comparing two text documents, for instance. However, this is purely a subjective feeling.

一如既往地感谢您的帮助.

Any help is, as always, appreciated.

推荐答案

请注意,下面的scipy熵调用是Kullback-Leibler散度.

Note that the scipy entropy call below is the Kullback-Leibler divergence.

请参阅: http://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence

#!/usr/bin/env python
from scipy.stats import entropy
from numpy.linalg import norm
import numpy as np

def JSD(P, Q):
    _P = P / norm(P, ord=1)
    _Q = Q / norm(Q, ord=1)
    _M = 0.5 * (_P + _Q)
    return 0.5 * (entropy(_P, _M) + entropy(_Q, _M))

还要注意,Question中的测试用例看起来很错误? p分布的总和不等于1.0.

Also note that the test case in the Question looks erred?? The sum of the p distribution does not add to 1.0.

请参阅: http://www.itl.nist.gov/div898/handbook /eda/section3/eda361.htm

这篇关于詹森-香农散度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆