Python中的球形k均值实现 [英] Spherical k-means implementation in Python

查看:120
本文介绍了Python中的球形k均值实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用 scipy的k均值现在已经有一段时间了,对于可用性和效率方面的工作方式,我感到非常高兴.但是,现在我想探索不同的k均值变体,更具体地说,我想应用球形k-均值在我的一些问题中.

I've been using scipy's k-means for quite some time now, and I'm pretty happy about the way it works in terms of usability and efficiency. However, now I want to explore different k-means variants, more specifically, I'd like to apply spherical k-means in some of my problems.

您知道球形k均值的任何良好的Python实现(即类似于scipy的k均值)吗?如果不是,修改scipy的源代码以使其k-means算法适应球形的难度有多大?

Do you know any good Python implementation (i.e. similar to scipy's k-means) of spherical k-means? If not, how hard would it be to modify scipy's source code to adapt its k-means algorithm to be spherical?

谢谢.

推荐答案

在球形k均值中,您旨在确保中心在球面上,因此可以调整算法以使用余弦距离,并且应该另外归一化最终结果的质心.

In spherical k-means, you aim to guarantee that the centers are on the sphere, so you could adjust the algorithm to use the cosine distance, and should additionally normalize the centroids of the final result.

当使用欧几里得距离时,我更喜欢将算法考虑为在每次迭代中将聚类中心投影到单位球体上,即,应在每个最大化步骤之后对中心进行归一化.

When using the Euclidean distance, I prefer to think of the algorithm as projecting the cluster centers onto the unit sphere in each iteration, i.e., the centers should be normalized after each maximization step.

实际上,当对中心和数据点都进行归一化时,余弦距离和欧几里得距离之间存在一对一的关系

Indeed, when the centers and data points are both normalized, there is a 1-to-1 relationship between the cosine distance and Euclidean distance

|a - b|_2 = 2 * (1 - cos(a,b))

jasonlaska/spherecluster 将scikit-learns的k-means修改为spherical k-means,并且还提供了另一种球面聚类算法.

The package jasonlaska/spherecluster modifies scikit-learns's k-means into spherical k-means and also provides another sphere clustering algorithm.

这篇关于Python中的球形k均值实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆