在线k均值聚类 [英] Online k-means clustering

查看:92
本文介绍了在线k均值聚类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否存在 k均值聚类算法的在线版本?

在线表示每个数据点都是串行处理的,每次进入系统时一次处理,因此节省了实时使用的计算时间.

By online I mean that every data point is processed in serial, one at a time as they enter the system, hence saving computing time when used in real time.

我已经写出了一份很好的自我报告,但是我真的更希望有一个标准化"的东西可以引用,因为这将被用在我的硕士论文中.

I have wrote one my self with good results, but I would really prefer to have something "standardized" to refer to, since it is to be used in my master thesis.

还有,有人对其他在线聚类算法有建议吗? (lmgtfy失败;))

Also, does anyone have advice for other online clustering algorithms? (lmgtfy failed ;))

推荐答案

是的. Google找不到它,因为它通常被称为顺序k均值".

Yes there is. Google failed to find it because it's more commonly known as "sequential k-means".

您可以在普林斯顿CS类笔记的这一部分,由 Richard Duda 一个>.我已复制了以下两个实现之一:

You can find two pseudo-code implementations of sequential K-means in this section of some Princeton CS class notes by Richard Duda. I've reproduced one of the two implementations below:

Make initial guesses for the means m1, m2, ..., mk
Set the counts n1, n2, ..., nk to zero
Until interrupted
    Acquire the next example, x
    If mi is closest to x
        Increment ni
        Replace mi by mi + (1/ni)*( x - mi)
    end_if
end_until

关于它的美丽之处在于,您只需要记住每个群集的平均值和分配给该群集的数据点数量的计数即可.一旦更新了这两个变量,就可以丢弃数据点.

The beautiful thing about it is that you only need to remember the mean of each cluster and the count of the number of data points assigned to the cluster. Once you update those two variables, you can throw away the data point.

我不确定您在哪里可以找到它的引文.我将开始查看Duda的经典文字样式分类和场景分析或较新的版本模式分类.如果没有,您可以尝试克里斯·毕晓普(Chris Bishop)的最新著作或达芙妮·科勒(Daphne Koller)和尼尔·弗里德曼(Nir Friedman)的最新著作.

I'm not sure where you would be able to find a citation for it. I would start looking in Duda's classic text Pattern Classification and Scene Analysis or the newer edition Pattern Classification. If it's not there, you could try Chris Bishop's newest book or Daphne Koller and Nir Friedman's recent text.

这篇关于在线k均值聚类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆