使用Scipy kmeans进行聚类分析 [英] Using scipy kmeans for cluster analysis

查看：470 发布时间：2020/4/26 10:26:16 python numpy scipy cluster-analysis k-means

本文介绍了使用Scipy kmeans进行聚类分析的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想了解 scipy.cluster.vq.kmeans .

I want to understand scipy.cluster.vq.kmeans.

在2D空间中分布有许多点，问题在于将它们分组为簇.我阅读这个问题引起了我的注意，我一直以为scipy.cluster.vq.kmeans是这样去.

Having a number of points distributed in 2D space, the problem is to group them into clusters. This problem came to my attention reading this question and I was thinking that scipy.cluster.vq.kmeans would be way to go.

这是数据:

This is the data:

使用以下代码，目的是获得25个群集中每个群集的中心点.

Using the following code, the aim would be to get the center point of each of the 25 clusters.

import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.vq import vq, kmeans, whiten

pos = np.arange(0,20,4)
scale = 0.4
size = 50
x = np.array([np.random.normal(i,scale,size*len(pos)) for i in pos]).flatten()
y = np.array([np.array([np.random.normal(i,scale,size) for i in pos]) for j in pos]).flatten()


plt.scatter(x,y, s=16, alpha=0.4)


#perform clustering with scipy.cluster.vq.kmeans
features = np.c_[x,y]

# take raw data to cluster
clusters = kmeans(features,25)
p = clusters[0]
plt.scatter(p[:,0],p[:,1], s=81, c="crimson")

# perform whitening (normalization to std) first
whitened = whiten(features) 
clustersw = kmeans(whitened,25)
q = clustersw[0]*features.std(axis=0)
plt.scatter(q[:,0],q[:,1], s=25, c="gold")

plt.show()

结果如下:

The result looks like this:

红色点标记群集中心的位置而不会变白，黄色点则使用那些具有白色的点.尽管它们是不同的，但主要的问题是，它们显然并非都处在正确的位置.因为所有集群都很好地分开了，所以我很难理解为什么这个简单的集群失败了.

The red dots mark the location of the cluster centers without whitening, the yellow points those with whitening being used. While they are different, the main problem is that they are obviously not all at the correct position. Because the clusters are all well separated, I'm having trouble to understand why this simple clustering fails.

我阅读了这个问题，该问题报告了kmeans没有给出准确的结果，但是答案并不是真正的统计性的.推荐的将kmeans2与minit='points'结合使用的解决方案也不起作用.即kmeans2(features,25, minit='points')给出的结果与上述类似.

I read this question which reports about kmeans not giving accurate results, but the answer is not really statisfactory. The suggested solution to use kmeans2 with minit='points' did not work either; i.e. kmeans2(features,25, minit='points') gives a similar result as the above.

问题是，有没有办法用scipy.cluster.vq.kmeans执行这个简单的聚类问题?如果是这样，我将如何确保获得正确的结果.

So the question would be, is there a way to perform this easy clustering problem with scipy.cluster.vq.kmeans? And if so, how would I make sure to get the correct result.

使用Scipy kmeans进行聚类分析 [英] Using scipy kmeans for cluster analysis

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用Scipy kmeans进行聚类分析 [英] Using scipy kmeans for cluster analysis

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭