增量熵计算 [英] Incremental entropy computation

查看:396
本文介绍了增量熵计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

std :: vector< int> counts 是正整数的向量,并让 N:= counts [0] + ... + counts [counts.length()-1] 是向量分量的总和。设置 pi:= counts [i] / N ,我使用经典公式 H = p0 * log2(p0)+ ...计算熵。 + pn * log2(pn)

Let std::vector<int> counts be a vector of positive integers and let N:=counts[0]+...+counts[counts.length()-1] be the the sum of vector components. Setting pi:=counts[i]/N, I compute the entropy using the classic formula H=p0*log2(p0)+...+pn*log2(pn).

个计数向量正在变化- -计数增加---并且每200次更改都会重新计算熵。经过快速的Google和stackoverflow搜索之后,我找不到用于增量熵计算的任何方法。因此,问题就来了:是否有一种增量方法,例如像用于方差的方法一样,用于熵

The counts vector is changing --- counts are incremented --- and every 200 changes I recompute the entropy. After a quick google and stackoverflow search I couldn't find any method for incremental entropy computation. So the question: Is there an incremental method, like the ones for variance, for entropy computation?

编辑:此问题的动机是在 VFDT 型学习者。

Motivation for this question was usage of such formulas for incremental information gain estimation in VFDT-like learners.

已解决::请参见此mathoverflow帖子

推荐答案

我导出了熵和基尼系数的更新公式和算法,请注意在arXiv上可用。 (该注释的工作版本可在此处获得。)另请参见此mathoverflow 答案。

I derived update formulas and algorithms for entropy and Gini index and made the note available on arXiv. (The working version of the note is available here.) Also see this mathoverflow answer.

为方便起见我包括简单的Python代码,演示了派生公式:

For the sake of convenience I am including simple Python code, demonstrating the derived formulas:

from math import log
from random import randint

# maps x to -x*log2(x) for x>0, and to 0 otherwise 
h = lambda p: -p*log(p, 2) if p > 0 else 0

# update entropy if new example x comes in 
def update(H, S, x):
    new_S = S+x
    return 1.0*H*S/new_S+h(1.0*x/new_S)+h(1.0*S/new_S)

# entropy of union of two samples with entropies H1 and H2
def update(H1, S1, H2, S2):
    S = S1+S2
    return 1.0*H1*S1/S+h(1.0*S1/S)+1.0*H2*S2/S+h(1.0*S2/S)

# compute entropy(L) using only `update' function 
def test(L):
    S = 0.0 # sum of the sample elements
    H = 0.0 # sample entropy 
    for x in L:
        H = update(H, S, x)
        S = S+x
    return H

# compute entropy using the classic equation 
def entropy(L):
    n = 1.0*sum(L)
    return sum([h(x/n) for x in L])

# entry point 
if __name__ == "__main__":
    L = [randint(1,100) for k in range(100)]
    M = [randint(100,1000) for k in range(100)]

    L_ent = entropy(L)
    L_sum = sum(L)

    M_ent = entropy(M)
    M_sum = sum(M)

    T = L+M

    print "Full = ", entropy(T)
    print "Update = ", update(L_ent, L_sum, M_ent, M_sum)

这篇关于增量熵计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆