将距离矩阵传递给 sklearn 中的 k-means 聚类 [英] Passing distance matrix to k-means clustering in sklearn

查看:102
本文介绍了将距离矩阵传递给 sklearn 中的 k-means 聚类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据 sklearn kmeans 文档,它说 k-means 需要一个 shape=(n_samples, n_features) 的矩阵.但是我提供了一个 shape=(n_samples,n_samples) 的距离矩阵,其中每个索引保存两个字符串之间的距离.时间序列已使用 SAX 表示转换为字符串.

As per as the sklearn kmeans documentation, it says that k-means requires a matrix of shape=(n_samples, n_features). But I provided a distance matrix of shape=(n_samples,n_samples) where each index holds the distance between two strings. The time series has been converted into strings using the SAX representation.

当我用距离矩阵运行聚类时,它给出了很好的结果.这可能是什么原因?据我所知,K-medoids 是一种使用距离矩阵的方法.

When I ran the clustering with the distance matrix, it gives good result. What can be the possible reason for this? As far as I know, K-medoids is the one which works with distance matrix.

推荐答案

K-means,顾名思义,使用means.

K-means, as the name indicates, uses means.

计算算术平均值需要访问原始特征,不能使用距离矩阵.

Computing the arithmetic mean requires access to the original features, a distance matrix cannot be used.

K-means 也不使用成对距离.所以距离矩阵对于这个算法是没有用的.

K-means also does not use pairwise distances. So the distance matrix is useless for this algorithm.

选择不同的算法,例如层次聚类.

Choose a different algorithm instead, such as hierarchical clustering.

这篇关于将距离矩阵传递给 sklearn 中的 k-means 聚类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆