Kmeans Matlab“在迭代1处创建的空集群".错误 [英] Kmeans matlab "Empty cluster created at iteration 1" error

查看:163
本文介绍了Kmeans Matlab“在迭代1处创建的空集群".错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用此脚本使用kmeans matlab函数对一组3D点进行聚类,但始终会收到此错误在迭代1中创建的空聚类". 我正在使用的脚本:

I'm using this script to cluster a set of 3D points using the kmeans matlab function but I always get this error "Empty cluster created at iteration 1". The script I'm using:

[G,C] = kmeans(XX, K, 'distance','sqEuclidean', 'start','sample');

可以在此链接 XX值中找到

XX,并且设置了K至3 因此,如果有人可以告诉我为什么会这样.

XX can be found in this link XX value and the K is set to 3 So if anyone could please advise me why this is happening.

推荐答案

这只是告诉您,在分配-重新计算迭代期间,群集变为空(丢失了所有分配的点).这通常是由于群集初始化不足或数据的固有群集少于您指定的.

It is simply telling you that during the assign-recompute iterations, a cluster became empty (lost all assigned points). This is usually caused by an inadequate cluster initialization, or that the data has less inherent clusters than you specified.

尝试使用start选项更改初始化方法. Kmeans 提供了四种可能的初始化簇的技术:

Try changing the initialization method using the start option. Kmeans provides four possible techniques to initialize clusters:

  • 样本:从数据中随机抽取K个点作为初始聚类(默认)
  • 均匀:在数据范围内均匀选择K个点
  • 集群:对一小部分子集进行初步聚类
  • 手动:手动指定初始群集

您还可以尝试使用emptyaction选项的不同值,该选项告诉MATLAB当群集为空时该怎么做.

Also you can try the different values of emptyaction option, which tells MATLAB what to do when a cluster becomes empty.

最终,我认为您需要减少群集的数量,即尝试使用K=2群集.

Ultimately, I think you need to reduce the number of clusters, i.e try K=2 clusters.

我试图将您的数据可视化以对其进行处理:

I tried to visualize your data to get a feel for it:

load matlab_X.mat
figure('renderer','zbuffer')
line(XX(:,1), XX(:,2), XX(:,3), ...
    'LineStyle','none', 'Marker','.', 'MarkerSize',1)
axis vis3d; view(3); grid on

经过一些手动缩放/平移后,它看起来像一个人的轮廓:

After some manual zooming/panning, it looks like a silhouette of a person:

您可以看到307200点的数据确实密集而紧凑,这证实了我的怀疑;数据没有那么多簇.

You can see that the data of 307200 points is really dense and compact, which confirms what I suspected; the data doesnt have that many clusters.

这是我尝试的代码:

>> [IDX,C] = kmeans(XX, 3, 'start','uniform', 'emptyaction','singleton');
>> tabulate(IDX)
  Value    Count   Percent
      1    18023      5.87%
      2    264690     86.16%
      3    24487      7.97%

此外,聚类2中的所有点都是重复点([0 0 0]):

Whats more, the entire points in cluster 2 are all duplicate points ([0 0 0]):

>> unique(XX(IDX==2,:),'rows')
ans =
     0     0     0

其他两个群集如下:

clr = lines(max(IDX));
for i=1:max(IDX)
line(XX(IDX==i,1), XX(IDX==i,2), XX(IDX==i,3), ...
    'Color',clr(i,:), 'LineStyle','none', 'Marker','.', 'MarkerSize',1)
end

因此,如果您先删除重复的点,则可能会得到更好的集群...

So you might get better clusters if you first remove duplicate points first...

此外,您还有一些离群值可能会影响聚类的结果.在视觉上,我将数据范围缩小到以下范围,该范围涵盖了大多数数据:

In addition, you have a few outliers that might affect the result of clustering. Visually, I narrowed down the range of the data to the following intervals which encompasses most of the data:

>> xlim([-500 100])
>> ylim([-500 100])
>> zlim([900 1500])

以下是除去重复点(超过25万个点)和离群点(约250个数据点)并与K=3聚类(五分之二的最佳选择replicates运行)后的结果:

Here is the result after removing dupe points (over 250K points) and outliers (around 250 data points), and clustering with K=3 (best of out of 5 runs with the replicates option):

XX = unique(XX,'rows');
XX(XX(:,1) < -500 | XX(:,1) > 100, :) = [];
XX(XX(:,2) < -500 | XX(:,2) > 100, :) = [];
XX(XX(:,3) < 900 | XX(:,3) > 1500, :) = [];

[IDX,C] = kmeans(XX, 3, 'replicates',5);

在三个簇之间几乎相等地划分:

with almost an equal split across the three clusters:

>> tabulate(IDX)
  Value    Count   Percent
      1    15605     36.92%
      2    15048     35.60%
      3    11613     27.48%

回想一下,默认距离函数是欧式距离,它解释了形成的簇的形状.

Recall that the default distance function is euclidean distance, which explains the shape of the formed clusters.

这篇关于Kmeans Matlab“在迭代1处创建的空集群".错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆