将距离矩阵传递给 sklearn 中的 k-means 聚类 [英] Passing distance matrix to k-means clustering in sklearn
问题描述
根据 sklearn kmeans 文档,它说 k-means 需要一个 shape=(n_samples, n_features) 的矩阵.但是我提供了一个 shape=(n_samples,n_samples) 的距离矩阵,其中每个索引保存两个字符串之间的距离.时间序列已使用 SAX 表示转换为字符串.
As per as the sklearn kmeans documentation, it says that k-means requires a matrix of shape=(n_samples, n_features). But I provided a distance matrix of shape=(n_samples,n_samples) where each index holds the distance between two strings. The time series has been converted into strings using the SAX representation.
当我用距离矩阵运行聚类时,它给出了很好的结果.这可能是什么原因?据我所知,K-medoids 是一种使用距离矩阵的方法.
When I ran the clustering with the distance matrix, it gives good result. What can be the possible reason for this? As far as I know, K-medoids is the one which works with distance matrix.
推荐答案
K-means,顾名思义,使用means.
K-means, as the name indicates, uses means.
计算算术平均值需要访问原始特征,不能使用距离矩阵.
Computing the arithmetic mean requires access to the original features, a distance matrix cannot be used.
K-means 也不使用成对距离.所以距离矩阵对于这个算法是没有用的.
K-means also does not use pairwise distances. So the distance matrix is useless for this algorithm.
选择不同的算法,例如层次聚类.
Choose a different algorithm instead, such as hierarchical clustering.
这篇关于将距离矩阵传递给 sklearn 中的 k-means 聚类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!