python中的k均值:确定与每个质心关联的数据 [英] k-means in python: Determine which data are associated with each centroid

查看:107
本文介绍了python中的k均值:确定与每个质心关联的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用scipy.cluster.vq.kmeans进行k均值聚类,但是想知道是否有一种方法可以确定每个数据点(实际上是与之相关)的质心.

I've been using scipy.cluster.vq.kmeans for doing some k-means clustering, but was wondering if there's a way to determine which centroid each of your data points is (putativly) associated with.

很明显,您可以手动执行此操作,但是据我所知kmeans函数不会返回此值?

Clearly you could do this manually, but as far as I can tell the kmeans function doesn't return this?

推荐答案

scipy.cluster.vq中还有一个函数kmeans2,它也返回标签.

There is a function kmeans2 in scipy.cluster.vq that returns the labels, too.

In [8]: X = scipy.randn(100, 2)

In [9]: centroids, labels = kmeans2(X, 3)

In [10]: labels
Out[10]: 
array([2, 1, 2, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 2, 2, 1, 2, 1, 2, 1, 2, 0,
       1, 0, 2, 0, 1, 2, 0, 1, 0, 1, 1, 2, 2, 2, 2, 1, 2, 1, 1, 1, 2, 0, 0,
       2, 2, 0, 1, 0, 0, 0, 2, 2, 2, 0, 0, 1, 2, 1, 0, 0, 0, 2, 1, 1, 1, 1,
       1, 0, 0, 1, 0, 1, 2, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 2, 0, 2, 2, 0,
       1, 1, 0, 1, 0, 0, 0, 2])

否则,如果必须使用kmeans,则还可以使用vq来获取标签:

Otherwise, if you must use kmeans, you can also use vq to get labels:

In [17]: from scipy.cluster.vq import kmeans, vq

In [18]: codebook, distortion = kmeans(X, 3)

In [21]: code, dist = vq(X, codebook)

In [22]: code
Out[22]: 
array([1, 0, 1, 0, 2, 2, 2, 0, 1, 1, 0, 2, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1,
       2, 2, 1, 2, 0, 1, 1, 0, 2, 2, 0, 1, 0, 1, 0, 2, 1, 2, 0, 2, 1, 1, 1,
       0, 1, 2, 0, 1, 2, 2, 1, 1, 1, 2, 2, 0, 0, 2, 2, 2, 2, 1, 0, 2, 2, 2,
       0, 1, 1, 2, 1, 0, 0, 0, 0, 1, 2, 1, 2, 0, 2, 0, 2, 2, 1, 1, 1, 1, 1,
       2, 0, 2, 0, 2, 1, 1, 1])

文档:scipy.cluster. vq

这篇关于python中的k均值:确定与每个质心关联的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆