Kmeans Matlab“在迭代1处创建的空集群".错误 [英] Kmeans matlab "Empty cluster created at iteration 1" error
问题描述
我正在使用此脚本使用kmeans matlab函数对一组3D点进行聚类,但始终会收到此错误在迭代1中创建的空聚类". 我正在使用的脚本:
I'm using this script to cluster a set of 3D points using the kmeans matlab function but I always get this error "Empty cluster created at iteration 1". The script I'm using:
[G,C] = kmeans(XX, K, 'distance','sqEuclidean', 'start','sample');
可以在此链接 XX值中找到
XX,并且设置了K至3 因此,如果有人可以告诉我为什么会这样.
XX can be found in this link XX value and the K is set to 3 So if anyone could please advise me why this is happening.
推荐答案
这只是告诉您,在分配-重新计算迭代期间,群集变为空(丢失了所有分配的点).这通常是由于群集初始化不足或数据的固有群集少于您指定的.
It is simply telling you that during the assign-recompute iterations, a cluster became empty (lost all assigned points). This is usually caused by an inadequate cluster initialization, or that the data has less inherent clusters than you specified.
尝试使用start
选项更改初始化方法. Kmeans 提供了四种可能的初始化簇的技术:
Try changing the initialization method using the start
option. Kmeans provides four possible techniques to initialize clusters:
- 样本:从数据中随机抽取K个点作为初始聚类(默认)
- 均匀:在数据范围内均匀选择K个点
- 集群:对一小部分子集进行初步聚类
- 手动:手动指定初始群集
您还可以尝试使用emptyaction
选项的不同值,该选项告诉MATLAB当群集为空时该怎么做.
Also you can try the different values of emptyaction
option, which tells MATLAB what to do when a cluster becomes empty.
最终,我认为您需要减少群集的数量,即尝试使用K=2
群集.
Ultimately, I think you need to reduce the number of clusters, i.e try K=2
clusters.
我试图将您的数据可视化以对其进行处理:
I tried to visualize your data to get a feel for it:
load matlab_X.mat
figure('renderer','zbuffer')
line(XX(:,1), XX(:,2), XX(:,3), ...
'LineStyle','none', 'Marker','.', 'MarkerSize',1)
axis vis3d; view(3); grid on
经过一些手动缩放/平移后,它看起来像一个人的轮廓:
After some manual zooming/panning, it looks like a silhouette of a person:
您可以看到307200点的数据确实密集而紧凑,这证实了我的怀疑;数据没有那么多簇.
You can see that the data of 307200 points is really dense and compact, which confirms what I suspected; the data doesnt have that many clusters.
这是我尝试的代码:
>> [IDX,C] = kmeans(XX, 3, 'start','uniform', 'emptyaction','singleton');
>> tabulate(IDX)
Value Count Percent
1 18023 5.87%
2 264690 86.16%
3 24487 7.97%
此外,聚类2中的所有点都是重复点([0 0 0]
):
Whats more, the entire points in cluster 2 are all duplicate points ([0 0 0]
):
>> unique(XX(IDX==2,:),'rows')
ans =
0 0 0
其他两个群集如下:
clr = lines(max(IDX));
for i=1:max(IDX)
line(XX(IDX==i,1), XX(IDX==i,2), XX(IDX==i,3), ...
'Color',clr(i,:), 'LineStyle','none', 'Marker','.', 'MarkerSize',1)
end
因此,如果您先删除重复的点,则可能会得到更好的集群...
So you might get better clusters if you first remove duplicate points first...
此外,您还有一些离群值可能会影响聚类的结果.在视觉上,我将数据范围缩小到以下范围,该范围涵盖了大多数数据:
In addition, you have a few outliers that might affect the result of clustering. Visually, I narrowed down the range of the data to the following intervals which encompasses most of the data:
>> xlim([-500 100])
>> ylim([-500 100])
>> zlim([900 1500])
以下是除去重复点(超过25万个点)和离群点(约250个数据点)并与K=3
聚类(五分之二的最佳选择replicates
运行)后的结果:
Here is the result after removing dupe points (over 250K points) and outliers (around 250 data points), and clustering with K=3
(best of out of 5 runs with the replicates
option):
XX = unique(XX,'rows');
XX(XX(:,1) < -500 | XX(:,1) > 100, :) = [];
XX(XX(:,2) < -500 | XX(:,2) > 100, :) = [];
XX(XX(:,3) < 900 | XX(:,3) > 1500, :) = [];
[IDX,C] = kmeans(XX, 3, 'replicates',5);
在三个簇之间几乎相等地划分:
with almost an equal split across the three clusters:
>> tabulate(IDX)
Value Count Percent
1 15605 36.92%
2 15048 35.60%
3 11613 27.48%
回想一下,默认距离函数是欧式距离,它解释了形成的簇的形状.
Recall that the default distance function is euclidean distance, which explains the shape of the formed clusters.
这篇关于Kmeans Matlab“在迭代1处创建的空集群".错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!