具有选定初始中心的 k 均值 [英] k-means with selected initial centers

查看:15
本文介绍了具有选定初始中心的 k 均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用选定的初始质心进行 k 均值聚类.它说这里指定您的初始中心:

init : {‘k-means++’, ‘random’ or an ndarray}

如果一个 ndarray 被传递,它应该是形状 (n_clusters, n_features) 并给出初始中心.

我的 Python 代码:

X = np.array([[-19.07480000, -8.536],[22.010800000,-10.9737],[12.659700000,19.2601]], np.float64)km = KMeans(n_clusters=3,init=X).fit(data)# 打印公里数中心 = km.cluster_centers_印刷中心

返回错误:

RuntimeWarning: 显式初始中心位置通过:在 k-means 中只执行一个 init 而不是 n_init=10n_jobs=self.n_jobs)

并返回相同的初始中心.知道如何形成初始中心以使其被接受吗?

解决方案

KMeans 的默认行为是使用不同的随机质心多次初始化算法(即 伪造方法).然后随机初始化的次数由 n_init= 参数控制(文档):

<块引用>

n_init:整数,默认值:10

k-means 算法将在不同情况下运行的次数质心种子.最终结果将是最好的输出n_init 在惯性方面连续运行.

如果您将数组作为 init= 参数传递,那么只会使用数组中明确指定的质心执行单个初始化.您收到 RuntimeWarning 因为您仍在传递 n_init=10 的默认值(这里是相关的源代码行).

忽略这个警告实际上完全没问题,但是如果你的 init= 参数是一个数组,你可以通过传递 n_init=1 让它完全消失.>

I am trying to k-means clustering with selected initial centroids. It says here that to specify your initial centers:

init : {‘k-means++’, ‘random’ or an ndarray} 

If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers.

My code in Python:

X = np.array([[-19.07480000,  -8.536],
              [22.010800000,-10.9737],
              [12.659700000,19.2601]], np.float64)
km = KMeans(n_clusters=3,init=X).fit(data)
# print km
centers = km.cluster_centers_
print centers

Returns an error:

RuntimeWarning: Explicit initial center position passed: performing only one init in k-means instead of n_init=10
  n_jobs=self.n_jobs)

and return the same initial centers. Any idea how to form the initial centers so it can be accepted?

解决方案

The default behavior of KMeans is to initialize the algorithm multiple times using different random centroids (i.e. the Forgy method). The number of random initializations is then controlled by the n_init= parameter (docs):

n_init : int, default: 10

Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.

If you pass an array as the init= argument then only a single initialization will be performed using the centroids explicitly specified in the array. You are getting a RuntimeWarning because you are still passing the default value of n_init=10 (here are the relevant lines of source code).

It's actually totally fine to ignore this warning, but you can make it go away completely by passing n_init=1 if your init= parameter is an array.

这篇关于具有选定初始中心的 k 均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆