如何计算聚类熵?工作示例或软件代码 [英] How to calculate clustering entropy? A working example or software code

查看:96
本文介绍了如何计算聚类熵?工作示例或软件代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想计算此示例方案的熵

I would like to calculate entropy of this example scheme

http://nlp.stanford .edu/IR-book/html/htmledition/evaluation-of-clustering-1.html

有人可以用真实的价值观逐步解释吗?我知道公式的数量有限,但是我真的很不懂公式:)

Can anybody please explain step by step with real values? I know there are unliminted number of formulas but i am really bad at understanding formulas :)

例如,在给定的图像中,如何清楚地解释纯度的计算方法

For example in the given image, how to calculate purity is clearly and well explained

问题很明确.我需要一个示例,该示例如何计算该聚类方案的熵.可以逐步解释.计算方案可以是C#代码或Phyton代码

这里是熵公式

我将用C#对此进行编码

I will code this in C#

非常感谢您的帮助

我需要像这里给出的答案: https://stats.stackexchange.com/questions /95731/如何计算纯度

I need answer like given in here : https://stats.stackexchange.com/questions/95731/how-to-calculate-purity

推荐答案

我承认NLP本书的这一部分有些令人困惑,因为它们没有遵循对簇熵的外部度量的完整计算,而是他们专注于单个聚类熵计算的计算.相反,我将尝试使用一组更直观的变量,并包括用于计算外部总熵的完整方法.

This section of the NLP book is a little confusing I will admit because they don't follow through with the complete calculation of the external measure of cluster entropy, instead they focus on the calculation of an individual cluster entropy calculation. Instead I will try to use a more intuitive set of variables and include the complete method for calculating the external measure of total entropy.

其中:

是集群的集合

H(w)是单个聚类熵

N_w 是集群 w

N 是总点数.

其中: c 是所有分类的集合 C 中的一个分类

where: c is a classification in the set C of all classifications

P(w_c)是在群集 w 中数据点被分类为 c 的概率.

P(w_c) is probability of a data point being classified as c in cluster w.

要使其可用,我们可以用MLE(最大似然估计)代替概率达到的可能性:

To make this usable we can substitute the probability with the MLE (maximum likelihood estimate) of this probability to arrive at:

其中:

| w_c | 是集群 w

n_w 是集群 w

因此,在给定的示例中,您有3个聚类(w_1,w_2,w_3),我们将针对3个分类(x,圆,钻石)中的每一个分别计算每个聚类的熵.

So in the example given you have 3 clusters (w_1,w_2,w_3), and we will calculate the entropy for each cluster separately, for each of the 3 classifications (x,circle,diamond).

H(w_1)=(5/6)log_2(5/6)+(1/6)log_2(1/6)+(0/6)log_2(0/6)= -.650

H(w_1) = (5/6)log_2(5/6) + (1/6)log_2(1/6) + (0/6)log_2(0/6) = -.650

H(w_2)=(1/6)log_2(1/6)+(4/6)log_2(4/6)+(1/6)log_2(1/6)= -1.252

H(w_2) = (1/6)log_2(1/6) + (4/6)log_2(4/6) + (1/6)log_2(1/6) = -1.252

H(w_3)=(2/5)log_2(2/5)+(0/5)log_2(0/5)+(3/5)log_2(3/5)= -.971

H(w_3) = (2/5)log_2(2/5) + (0/5)log_2(0/5) + (3/5)log_2(3/5) = -.971

因此,要找到一组聚类的总熵,请取熵的总和乘以每个聚类的相对权重.

So then to find the total entropy for a set of clusters, you take the sum of the entropies times the relative weight of each cluster.

H(Ω)=(-.650 * 6/17)+(-1.252 * 6/17)+(-.971 * 5/17)

H(Omega) = (-.650 * 6/17) + (-1.252 * 6/17) + (-.971 * 5/17)

H(Omega)= -.956

H(Omega) = -.956

我希望这会有所帮助,请随时进行验证并提供反馈.

I hope this helps, please feel free to verify and provide feedback.

这篇关于如何计算聚类熵?工作示例或软件代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆