群集:群集验证 [英] Clustering: Cluster validation

查看:144
本文介绍了群集:群集验证的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想对大型社交网络数据集使用某种聚类方法.问题是如何评估聚类方法.是的,我可以使用一些外部,内部和相对集群验证方法.我使用标准化互信息(NMI)作为基于综合数据进行集群验证的外部验证方法.我通过生成5个具有相等节点数的集群以及每个集群内部的一些强连接链接和集群之间的弱链接来检查聚类方法,从而生成了一些综合数据集,然后在此综合数据集上分析了基于光谱聚类和基于模块化的社区检测方法.我对我的真实世界数据集使用具有最佳NMI的聚类,并检查算法的错误(成本函数),结果良好.我的成本函数的测试方法是否良好?或者我也应该再次验证我的真实单词簇的簇?

I want to use some clustering method for large social network dataset. The problem is how to evaluate the clustering method. yes, I can use some external ,internal and relative cluster validation methods. I used Normalized mutual information(NMI) as external validation method for cluster validation based on synthetic data. I produced some synthetic dataset by producing 5 clusters with equal number of nodes and some strongly connected links inside each cluster and weak links between clusters to check the clustering method, Then I analysed the spectral clustering and modularity based community detection methods on this synthetic datasets. I use the clustering with the best NMI for my real world dataset and check the error(cost function) of my algorithm and the result was good. Is my testing method for my cost function is good? or I should also validate clusters of my real word clusters again?

谢谢.

推荐答案

尝试一种以上的措施.

有十多种群集验证措施,很难预测哪种方法最适合问题.它们之间的差异尚未真正了解,因此最好是咨询多个.

There are a dozen cluster validation measures, and it's hard to predict which one is most appropriate for a problem. The differences between them are not really understood yet, so it's best if you consult more than one.

还请注意,如果您不使用标准化度量,则基线可能确实很高.因此,这些度量在说结果A与结果B比结果C更相似"时非常有用,但不应视为对质量的绝对度量.它们是相似性的相对度量.

Also note that if you don't use a normalized measure, the baseline may be really high. So the measures are mostly useful to say "result A is more similar to result B than result C", but should not be taken as an absolute measure of quality. They are a relative measure of similarity.

这篇关于群集:群集验证的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆