两个群集之间的最近点Matlab [英] Nearest point between two clusters Matlab

查看:111
本文介绍了两个群集之间的最近点Matlab的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组由3D点组成的簇.我想从每两个群集中获取最近的两个点.

I have a set of clusters consisting of 3D points. I want to get the nearest two points from each two clusters.

例如:我有5个由3D点组成的群集C1至C5.对于C1和C2,有两个点Pc1在C1中的点"和Pc2在C2中的点",它们是两个群集C1和C2之间的最靠近的两个点,在C1和C3..C5之间相同,在C2和C3之间相同. .C5等.之后,我将得到20个代表不同聚类之间最近点的点.

For example: I have 5 clusters C1 to C5 consisting of a 3D points. For C1 and C2 there are two points Pc1 "point in C1" and Pc2 "point in C2" that are the closet two points between the two clusters C1 and C2, same between C1 and C3..C5 and same between C2 and C3..C5 and so on. After that I'll have 20 points representing the nearest points between the different clusters.

第二件事是,如果每个点之间的距离小于一定的阈值",我想将这些点连接在一起.

The second thing is that I want to connect this points together if the distance between each of them and the other is less than a certain distance "threshold".

所以我问是否有人可以建议我

So I'm asking if anyone could please advise me

Update:

感谢Amro的回答,我已将其更新为CIDX = kmeans(X,K,'distance','cityblock','replicates',5);解决空簇错误.但是,另一个错误出现了"pdistmex内存不足.请为您的选项键入HELP MEMORY".因此,我在这里检查了您的答案:内存不足错误,而在MATLAB中使用clusterdata 并如下更新代码,但是现在的问题是,此代码mn = min(min(D(idx1,idx2)));中现在存在索引错误,我在问是否存在针对此错误的解决方法?

Thanks Amro for your answer, I've updated it to CIDX=kmeans(X, K,'distance','cityblock', 'replicates',5); to solve the empty cluster error. But another error appeared "pdistmex Out of memory. Type HELP MEMORY for your options." So I've checked your answer here: Out of memory error while using clusterdata in MATLAB and updated your code as below but the problem now is that there is now an indexing error in this code mn = min(min(D(idx1,idx2))); I'm asking if there is a workaround for this error?

使用的代码:

%function  single_linkage(depth,clrr)
X = randn(5000,3);
%X=XX;
% clr = clrr;
K=7;
clr = jet(K);
%// cluster into K=4
K = 7;
%CIDX = kmeans(X,K);


%// pairwise distances
SUBSET_SIZE = 1000;            %# subset size
ind = randperm(size(X,1));
data = X(ind(1:SUBSET_SIZE), :);
D = squareform(pdist(data));
subs = 1:size(D,1);
CIDX=kmeans(D, K,'distance','sqEuclidean', 'replicates',5);
centers = zeros(K, size(data,2));
for i=1:size(data,2)
    centers(:,i) = accumarray(CIDX, data(:,i), [], @mean);
end

%# calculate distance of each instance to all cluster centers
D = zeros(size(X,1), K);
for k=1:K
    D(:,k) = sum( bsxfun(@minus, X, centers(k,:)).^2, 2);
end
%D=squareform(D);
%# assign each instance to the closest cluster
[~,clustIDX] = min(D, [], 2);
%// for each pair of clusters
cpairs = nchoosek(1:K,2);
pairs = zeros(size(cpairs)); 
dists = zeros(size(cpairs,1),1);
for i=1:size(cpairs,1)
    %// index of points assigned to each of the two cluster
    idx1 = (clustIDX == cpairs(i,1));
    idx2 = (clustIDX == cpairs(i,2));

    %// shortest distance between the two clusters
    mn = min(min(D(idx1,idx2)));
    dists(i) = mn;

    %// corresponding pair of points with the minimum distance
    [r,c] = find(D(idx1,idx2)==mn);
    s1 = subs(idx1); s2 = subs(idx2);
    pairs(i,:) = [s1(r) s2(c)];
end

%// filter pairs by keeping only those whose distances is below a threshold
thresh = inf;
cpairs(dist>thresh,:) = [];

%// plot 3D points color-coded by clusters
figure('renderer','zbuffer')
%clr = lines(K);
h = zeros(1,K);
for i=1:K
h(i) = line(X(CIDX==i,1), X(CIDX==i,2), X(CIDX==i,3), ...
    'Color',clr(i,:), 'LineStyle','none', 'Marker','.', 'MarkerSize',5);
end
legend(h, num2str((1:K)', 'C%d'))   %'
view(3), axis vis3d, grid on

%// mark and connect nearest points between each pair of clusters
for i=1:size(pairs,1)
    line(X(pairs(i,:),1), X(pairs(i,:),2), X(pairs(i,:),3), ...
        'Color','k', 'LineStyle','-', 'LineWidth',3, ...
        'Marker','o', 'MarkerSize',10);
end

推荐答案

您所要求的声音类似于

What you are asking for sounds similar to what single-linkage clustering does at each step; from the bottoms-up, clusters separated by the shortest distance are combined.

无论如何,以下是解决此问题的蛮力方法.我敢肯定还有更高效的实现方式,但是这一实现方式很容易实现.

Anyway below is the brute-force way of solving this. I'm sure there are more efficient implementations, but this one is easy to implement.

%// data of 3D points
X = randn(5000,3);

%// cluster into K=4
K = 4;
CIDX = kmeans(X,K);

%// pairwise distances
D = squareform(pdist(X));
subs = 1:size(X,1);

%// for each pair of clusters
cpairs = nchoosek(1:K,2);
pairs = zeros(size(cpairs));
dists = zeros(size(cpairs,1),1);
for i=1:size(cpairs,1)
    %// index of points assigned to each of the two cluster
    idx1 = (CIDX == cpairs(i,1));
    idx2 = (CIDX == cpairs(i,2));

    %// shortest distance between the two clusters
    mn = min(min(D(idx1,idx2)));
    dists(i) = mn;

    %// corresponding pair of points with the minimum distance
    [r,c] = find(D(idx1,idx2)==mn);
    s1 = subs(idx1); s2 = subs(idx2);
    pairs(i,:) = [s1(r) s2(c)];
end

%// filter pairs by keeping only those whose distances is below a threshold
thresh = inf;    %// use your threshold value instead
cpairs(dists>thresh,:) = [];

%// plot 3D points color-coded by clusters
figure('renderer','zbuffer')
clr = lines(K);
h = zeros(1,K);
for i=1:K
    h(i) = line(X(CIDX==i,1), X(CIDX==i,2), X(CIDX==i,3), ...
        'Color',clr(i,:), 'LineStyle','none', ...
        'Marker','.', 'MarkerSize',5);
end
legend(h, num2str((1:K)', 'C%d'))   %'
view(3), axis vis3d, grid on

%// mark and connect nearest points between each pair of clusters
for i=1:size(pairs,1)
    line(X(pairs(i,:),1), X(pairs(i,:),2), X(pairs(i,:),3), ...
        'Color','k', 'LineStyle','-', 'LineWidth',3, ...
        'Marker','o', 'MarkerSize',10);
end

请注意,在上面的示例中,数据是随机生成的,不是很有趣,因此很难看到所连接的最近点.

Note that in the above example the data is randomly generated and not very interesting, so it is hard to see the connected nearest points.

只是为了好玩,这是另一个结果,我只是用两个集群对之间的最大距离替换了最小距离(类似于

Just for fun, here is another result where I simply replaced the min-distance by the max-distance between pair of clusters (similar to complete-linkage clustering), i.e use:

mx = max(max(D(idx1,idx2)));

代替之前的

mn = min(min(D(idx1,idx2)));

显示了如何连接每对集群之间的最远点.在我看来,这种可视化效果更有趣:)

which shows how we connect the farthest points between each pair of clusters. This visualization is a bit more interesting in my opinion :)

这篇关于两个群集之间的最近点Matlab的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆