"k均值"和"k均值"之间的区别是什么?以及“模糊c表示"目标功能? [英] whats is the difference between "k means" and "fuzzy c means" objective functions?

查看:124
本文介绍了"k均值"和"k均值"之间的区别是什么?以及“模糊c表示"目标功能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图查看两者的性能是否可以根据它们工作的目标函数进行比较?

I am trying to see if the performance of both can be compared based on the objective functions they work on?

推荐答案

BTW, Fuzzy-C-Means (FCM)聚类算法也称为 Soft K-Means .

目标函数实际上是相同的,唯一的区别是引入了一个向量,该向量表示给定点对每个聚类的归属百分比.该向量服从刚度"指数,该指数的目的是更加重视更强的连接(反之,则是最小化较弱的连接的权重);顺便说一句,当刚度因子趋于无穷大时,所得的矢量变为二进制矩阵,因此使FCM模型与K-Means相同.

The objective functions are virtually identical, the only difference being the introduction of a vector which expresses the percentage of belonging of a given point to each of the clusters. This vector is submitted to a "stiffness" exponent aimed at giving more importance to the stronger connections (and conversely at minimizing the weight of weaker ones); incidently, when the stiffness factor tends towards infinity the resulting vector becomes a binary matrix, hence making the FCM model identical to that of the K-Means.

我认为,除了没有分配点的聚类的某些可能的问题外,还可以通过模拟无限刚度因子(=通过引入a来模拟K-Means算法和FCM算法).函数,将向量中的最大值更改为1,然后将其他值清零(代替向量的幂).当然,这是运行K-Means的非常低效的方式,因为该算法必须执行与真实FCM一样多的操作(如果仅使用1和0值,则可以简化算法,但不能简化复杂性).

I think that except for some possible issue with the clusters which have no points assigned to them, it is possible to emulate the K-Means algorithm with that of the FCM one, by simulating an infinite stiffness factor (= by introducing a function which changes the biggest value in the vector to 1, and zeros out the other values, in lieu of the exponentiation of the vector). This is of course a very inefficient way of running a K-Means, because the algorithm then has to perform as many operations as with a true FCM (if only with 1 and 0 values, which does simplify the arithmetic, but not the complexity)

关于性能,FCM因此需要针对每个点,每个维度执行k(即簇数)乘法(不计算幂也要考虑刚性).这加上计算和管理邻近向量所需的开销,解释了为什么FCM比纯K-Means慢得多.

With regards to performance, the FCM therefore needs to perform k (i.e. number of clusters) multiplications for each point, for each dimension (not counting also the exponentiation to take stiffness into account). This, plus the overhead needed for computing and managing the proximity vector, explains why FCM is quite slower than plain K-Means.

但是,例如,当涉及到细长的簇时(与其他维度一致的点往往会沿着一个或两个特定维度分散),FCM/Soft-K-Means的愚蠢"程度不如Hard-K-Means的愚蠢".这就是为什么它仍然存在的原因;-)

But FCM/Soft-K-Means is less "stupid" than Hard-K-Means when it comes for example to elongated clusters (when points otherwise consistent in other dimensions tend to scatter along a particular dimension or two), and that's why it's still around ;-)

根据我的原始回复:

我只是考虑了一下,但没有考虑任何数学"思想,FCM的收敛速度可能比硬K-Means更快,这在一定程度上抵消了FCM的较大计算需求.

2018年5月修改:

May 2018 edit:

实际上,我没有任何著名的研究可以确定哪些证据支持我对FCM更快收敛速度​​的直觉.谢谢本杰明·霍恩,让我诚实;-)

There is actually no reputable research that I could identify which support my above hunch about FCM's faster rate of convergence. Thank you Benjamin Horn to keep me honest ;-)

这篇关于"k均值"和"k均值"之间的区别是什么?以及“模糊c表示"目标功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆