可以K-均值聚类做分类? [英] Can k-means clustering do classification?

查看:271
本文介绍了可以K-均值聚类做分类?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道K-均值聚类算法是否可以做分类?

如果我做了一个简单的K-均值聚类。

假设我有很多数据,我用的k-means聚类,再得2集A,B。和质心计算方法是欧氏距离。

集A在左侧。

集群右侧的B面。

所以,如果我有一个新的数据。我该怎么办?

  1. 运行K-均值再次聚类算法,并能得到哪些集群新的联合国数据属于哪一种?

  2. 记录的最后一个质心,用欧氏距离​​来计算,以决定新的数据属于哪一种?

  3. 其他的方法?

解决方案

当然,最简单的方法是2,每个对象分配到最近的质心(使用加总平方技术上,而不是欧氏距离,这是更修正K-手段,为您节省了开方运算)。

方法1是脆弱的,因为K-装置可以给你一个完全不同的解决方案;特别是如果它没有好摆在首位适合您的数据(例如,过高的尺寸,太簇大小不同,太多的集群,...)

但是,下面的方法可以是更合理的:

3。列车实际分类。

是的,你可以使用K-手段产生初始分区,那么假设的k-means分区可能是合理的类(你的真正的应该不过验证这一点在一些点),然后继续,你会如果数据将是用户标记。

即。运行K-均值,培养了SVM上所产生的集群。然后使用SVM分类。

k近邻分类,或甚至每个对象分配到最近的聚类中心(选项1)可以被看作非常简单的分类器。后者是一个1NN分类器,训练的聚类中心仅

I want to know whether the k-means clustering algorithm can do classification?

If I have done a simple k-means clustering .

Assume I have many data , I use k-means clusterings, then get 2 clusters A, B. and the centroid calculating method is Euclidean distance.

Cluster A at left side.

Cluster B at right side.

So, if I have one new data. What should I do?

  1. Run k-means clustering algorithm again, and can get which cluster does the new data belong to?

  2. Record the last centroid and use Euclidean distance to calculating to decide the new data belong to?

  3. other method?

解决方案

The simplest method of course is 2., assign each object to the closest centroid (technically, use sum-of-squares, not Euclidean distance; this is more correct for k-means, and saves you a sqrt computation).

Method 1. is fragile, as k-means may give you a completely different solution; in particular if it didn't fit your data well in the first place (e.g. too high dimensional, clusters of too different size, too many clusters, ...)

However, the following method may be even more reasonable:

3. Train an actual classifier.

Yes, you can use k-means to produce an initial partitioning, then assume that the k-means partitions could be reasonable classes (you really should validate this at some point though), and then continue as you would if the data would have been user-labeled.

I.e. run k-means, train a SVM on the resulting clusters. Then use SVM for classification.

k-NN classification, or even assigning each object to the nearest cluster center (option 1) can be seen as very simple classifiers. The latter is a 1NN classifier, "trained" on the cluster centroids only.

这篇关于可以K-均值聚类做分类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆