了解python中kmeans聚类的输出 [英] Understanding output from kmeans clustering in python

查看：1157 发布时间：2020/4/26 10:25:38 python matrix k-means

本文介绍了了解python中kmeans聚类的输出的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有两个距离矩阵，每个矩阵都是232 * 232，其中列和行的标签是相同的.因此，这将是两者的缩写形式，其中A，B，C和D是要测量距离的点的名称:

I have two distance matrices, each 232*232 where the column and row labels are identical. So this would be an abridged version of the two where A, B, C and D are the names of the points between which the distances are measured:

  A  B  C  D ...    A  B  C  D  ...
A 0  1  5  3      A 0  5  3  9
B 4  0  4  1      B 2  0  7  8  
C 2  6  0  3      C 2  6  0  1
D 2  7  1  0      D 5  2  5  0
...               ...

因此，两个矩阵表示两个不同网络中的成对点之间的距离.我想确定在一个网络中彼此靠近而在另一个网络中相距较远的成对集群.我尝试通过首先将每个距离除以矩阵中的最大距离来调整每个矩阵中的距离，以实现此目的.然后，我从另一个矩阵中减去一个矩阵，并将聚类算法应用于结果矩阵.建议我使用的算法是k means算法.希望是我可以确定正数的簇，它们对应于在矩阵一中非常接近而在矩阵二中相距甚远的对，而对于负数的簇则相反.

The two matrices therefore represent the distances between pairs of points in two different networks. I want to identify clusters of pairs that are close together in one network and far apart in the other. I attempted to do this by first adjusting the distances in each matrix by dividing every distance by the largest distance in the matrix. I then subtracted one matrix from the other and applied a clustering algorithm to the resultant matrix. The algorithm I was advised to use for this was the k means algorithm. The hope was that I could identify clusters of positive numbers that would correspond to pairs that were very close in matrix one and far apart in matrix two and vice versa for clusters of negative numbers.

首先，我已经阅读了很多有关如何在python中实现k means的知识，我知道可以使用多个不同的模块.我已经尝试了所有这三个:

Firstly, I've read quite a bit about how to implement k means in python I'm aware that there are multiple different modules that can be used. I've tried all three of these:

import sklearn.cluster
import numpy as np

data = np.load('difference_matrix_file.npy') #loads difference matrix from file

a = np.array([x[0:] for x in data])
clust_centers = 3

model = sklearn.cluster.k_means(a, clust_centers)
print model

import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.cluster import KMeans

difference_matrix = np.load('difference_matrix_file.npy') #loads difference matrix from file

data = pd.DataFrame(difference_matrix)
model = KMeans(n_clusters=3)
print model.fit(data)

import numpy as np
from scipy.cluster.vq import vq, kmeans, whiten

np.set_printoptions(threshold=np.nan)

difference_matrix = np.load('difference_matrix_file.npy') #loads difference matrix from file

whitened = whiten(difference_matrix) 
centroids = kmeans(whitened, 3) 
print centroids

我正在努力的是如何解释这些脚本的输出. (在这一点上，我可能会补充说，如果读者还没有猜到的话，我既不是数学家，也不是计算机科学家).我期望算法的输出是聚类对的坐标列表，在这种情况下，每个聚类对应一个坐标，因此每个聚类有3个，这样我就可以追溯到我的两个原始矩阵并标识感兴趣的对的名称.

What I'm struggling with is how to interpret the output from these scripts. (I might add at this point that I'm neither a mathematician nor a computer scientist if the reader hadn't already guessed). I was expecting the output of the algorithm to be lists of coordinates of clustered pairs, one for each cluster so three in this case, that I could then trace back to my two original matrices and identify the names of the pairs of interest.

但是我得到的是一个包含数字列表的数组(每个集群一个)，但是我真的不明白这些数字是什么，它们显然与输入矩阵中的数字不符，除了事实上，每个列表中有232个项目，输入矩阵中的行和列数相同.数组中的列表项是另一个单个数字，我认为它必须是集群的质心，但是每个集群没有一个，整个数组只有一个.

However what I get is an array containing a list of numbers (one for each cluster) but I don't really understand what these numbers are, they don't obviously correspond to what I had in my input matrix other than the fact that there is 232 items in each list which is the same number of rows and columns there are in the input matrix. And the list item in the array is another single number which I presume must be the centroid of the clusters, but there isn't one for each cluster, just one for the whole array.

我已经尝试了好一阵子了，但我一直在努力争取到任何地方.每当我搜索解释kmeans的输出时，我都会得到关于如何在图上绘制聚类的说明，而这并不是我想要做的.请有人可以向我解释我在输出中看到的内容以及如何从中获取每个群集中项目的坐标吗?

I've been trying to figure this out for quite a while now but I'm struggling to get anywhere. Whenever I search for interpreting the output of kmeans I just get explanations of how to plot my clusters on a graph which isn't what I want to do. Please can someone explain to me what I'm seeing in my output and how I can get from this to the coordinates of the items in each cluster?

了解python中kmeans聚类的输出 [英] Understanding output from kmeans clustering in python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

了解python中kmeans聚类的输出 [英] Understanding output from kmeans clustering in python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭