scikit-学习kmeans聚类的初始质心 [英] initial centroids for scikit-learn kmeans clustering

查看:111
本文介绍了scikit-学习kmeans聚类的初始质心的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我已经有一个可以用作初始质心的numpy数组,如何正确初始化kmeans算法?我正在使用scikit-learn Kmeans类

if I already have a numpy array that can serve as the initial centroids, how can I properly initialize the kmeans algorithm? I am using the scikit-learn Kmeans class

此帖子(具有选定初始中心的k-means )表明如果我将numpy数组用作初始质心,则只需要设置n_init = 1即可,但是我不确定初始化是否正常进行

this post (k-means with selected initial centers) indicates that I only need to set n_init=1 if I am using a numpy array as the initial centroids but I am not sure if my initialization is working properly

纳夫塔利·哈里斯(Naftali Harris)出色的可视化页面显示了我正在尝试做的事情 http://www.naftaliharris.com/blog/visualizing-k-means -群集/

Naftali Harris' excellent visualization page shows what I am trying to do http://www.naftaliharris.com/blog/visualizing-k-means-clustering/

我会选择"->实心圆"->运行kmeans

"I'll choose" --> "Packed Circles" --> run kmeans

#numpy array of initial centroids
startpts=np.array([[-0.12, 0.939, 0.321, 0.011], [0.0, 0.874, -0.486, 0.862], [0.0, 1.0, 0.0, 0.033], [0.12, 0.939, 0.321, -0.7], [0.0, 1.0, 0.0, -0.203], [0.12, 0.939, -0.321, 0.25], [0.0, 0.874, 0.486, -0.575], [-0.12, 0.939, -0.321, 0.961]], np.float64)

centroids= sk.KMeans(n_clusters=8, init=startpts, n_init=1)

centroids.fit(actual_data_points)

#get the array
centroids_array=centroids.cluster_centers_

推荐答案

是的,应该可以通过init设置初始质心.这是来自scikit-learn 文档的引用:

Yes, setting initial centroids via init should work. Here's a quote from scikit-learn documentation:

 init : {‘k-means++’, ‘random’ or an ndarray}

     Method for initialization, defaults to ‘k-means++’:   

     If an ndarray is passed, it should be of shape (n_clusters, n_features)
     and gives the initial centers.


(n_clusters, n_features)指的是什么形状?

What is the shape (n_clusters, n_features) referring to?

形状要求意味着init必须精确地包含n_clusters行,并且每行中的元素数量应与actual_data_points的维数匹配:

The shape requirement means that init must have exactly n_clusters rows, and the number of elements in each row should match the dimensionality of actual_data_points:

>>> init = np.array([[-0.12, 0.939, 0.321, 0.011],
                     [0.0, 0.874, -0.486, 0.862],
                     [0.0, 1.0, 0.0, 0.033],
                     [0.12, 0.939, 0.321, -0.7],
                     [0.0, 1.0, 0.0, -0.203],
                     [0.12, 0.939, -0.321, 0.25],
                     [0.0, 0.874, 0.486, -0.575],
                     [-0.12, 0.939, -0.321, 0.961]],
                    np.float64)
>>> init.shape[0] == 8  
True  # n_clusters
>>> init.shape[1] == actual_data_points.shape[1]
True  # n_features

什么是n_features?

What is n_features?

n_features是样品的尺寸.例如,如果要在2D平面上对点进行聚类,则n_features将为2.

n_features is the dimensionality of your sample. For instance, if you were to cluster points on a 2D plane, n_features would be 2.

这篇关于scikit-学习kmeans聚类的初始质心的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆