聚类算法的性能分析 [英] Performance Analysis of Clustering Algorithms

查看:78
本文介绍了聚类算法的性能分析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

已给我2个数据集,并希望使用KNIME对这些数据集进行聚类分析.

I have been given 2 data sets and want to perform cluster analysis for the sets using KNIME.

完成聚类后,我希望对两种不同的聚类算法进行性能比较.

Once I have completed the clustering, I wish to carry out a performance comparison of 2 different clustering algorithms.

关于聚类算法的性能分析,这是时间的度量(算法时间复杂度和执行数据聚类所需的时间等)还是聚类输出的有效性?(或两者都有)

With regard to performance analysis of clustering algorithms, would this be a measure of time (algorithm time complexity and the time taken to perform the clustering of the data etc) or the validity of the output of the clusters? (or both)

还有其他角度来确定聚类算法的性能(或缺乏性能)吗?

Is there any other angle one look at to identify the performance (or lack of) for a clustering algorithm?

在此先感谢

  • T

推荐答案

这在很大程度上取决于您可用的数据.

It depends a lot on what data you have available.

衡量性能的一种常用方法是相对于现有(外部")标签(尽管对于分类而言,比对聚类而言更有意义).您可以使用大约两种方法进行测量.

A common way of measuring the performance is with respect to existing ("external") labels (albeit that would make more sense for classification than for clustering). There are around two dozen measures you can use for this.

使用内部"质量度量时,请确保它独立于算法.例如,k-means优化了这种度量,并且在针对该度量进行评估时将始终表现得最好.

When using an "internal" quality measure, make sure that it is independent of the algorithms. For example, k-means optimizes such a measure, and will always come out best when evaluating with respect to this measure.

这篇关于聚类算法的性能分析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆