在k个簇大小相等的组n个点 [英] Group n points in k clusters of equal size

查看：214 发布时间：2015/11/30 13:52:35 algorithm cluster-analysis k-means

本文介绍了在k个簇大小相等的组n个点的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

可能重复：
<一href="http://stackoverflow.com/questions/5452576/k-means-algorithm-variation-with-equal-cluster-size">K-means算法的变化与平等的簇大小

编辑：像casperOne指出来我这个问题是重复的。反正在这里的是，包括这一个更广义的问题：<一href="http://stats.stackexchange.com/questions/8744/clustering-procedure-where-each-cluster-has-an-equal-number-of-points">http://stats.stackexchange.com/questions/8744/clustering-procedure-where-each-cluster-has-an-equal-number-of-points

like casperOne point it out to me this question is a duplicate. Anyways here is a more generalized question that cover this one: http://stats.stackexchange.com/questions/8744/clustering-procedure-where-each-cluster-has-an-equal-number-of-points

我的要求

在一个项目中，我需要组n个点（X，Y）的k个簇大小相等（N / K）的。其中x和y是双浮动数字，n可以为100〜10000且k的范围从2至100。而且k被该算法运行之前已知

In a project I need to group n points (x,y) in k clusters of equal size (n / k). Where x and y are double floating numbers, n can range from 100 to 10000 and k can range from 2 to 100. Also k is known before the algorithm runs.

我experimentations

我开始解决使用<一个问题href="http://en.wikipedia.org/wiki/K-means_clustering">http://en.wikipedia.org/wiki/K-means_clustering算法，它工作的伟大，快速地产生大致相同大小的k个簇。

I started to resolve the problem by using the http://en.wikipedia.org/wiki/K-means_clustering algorithm, which work great and fast to produce exactly k clusters of roughly the same size.

但我的问题是这样的，K-均值产生集群大致相同的大小，我需要集群是完全相同的大小（或更precise的：我需要他们有楼之间的尺寸（N / K）和CEIL（N / K））。

But my problem is this, K-means produce clusters of roughly the same size, where I need the clusters to be exactly the same size (or to be more precise: I need them to have a size between floor(n / k) and ceil(n / k)).

在你点出来给我，是我尝试的第一个答案这里<一href="http://stackoverflow.com/questions/5452576/k-means-algorithm-variation-with-equal-cluster-size">K-means算法的变化与平等的簇大小，这听起来是个好主意。

Before you point it out to me, yes I tried the first answer here K-means algorithm variation with equal cluster size, which sounds like a good idea.

其主要思想是后期处理的集群产品由K-均值阵列。从最大群集到最小。我们减少具有超过N / K成员通过移动加分的其他最近的聚类簇的大小。只留下那些已经降低了集群。

The main idea is to post process the array of cluster produce by K-means. From the biggest cluster up to the smallest. We reduce the size of the clusters that have more than n / k members by moving extra points to an other nearest cluster. Leaving alone the clusters that are already reduced.

下面是伪code我实现了：

Here is the pseudo code I implemented:

n is the number of point
k is the number of cluster
m = n / k (the ideal cluster size)
c is the array of cluster after K-means
c' = c sorted by size in descending order
for each cluster i in c' where i = 1 to k - 1
    n = size of cluster i - m (the number of point to move)
    loop n times
        find a point p in cluster i with minimal distance to a cluster j in c' where j > i
        move point p from cluster i to cluster j
    end loop
    recalculate centroids
end for each

这个算法的问题是，接近过程结束（当我接近K），我们必须选择一个群集的J C'（其中j>我，因为我们需要独自离开集群已处理），但我们发现这组J可以获得远离第一组，从而打破集群的概念。

The problem with this algorithm is that near the end of the process (when i come close to k), we have to choose a cluster j in c' (where j > i because we need to leave alone the clusters already processed), but this cluster j we found can be far from cluster i, thus breaking the concept of cluster.

我的提问

有后K-means算法或K-均值变体可以达到我的要求，还是我从一开始就错了，我需要找到一个其他的聚类算法？

Is there a post K-means algorithm or a K-means variant that can meet my requirements, or am I wrong from the beginning and I need to find an other clustering algorithm?

PS：我不介意实施解决方案我自己，但是这将是巨大的，如果我可以用一个库，最好在JAVA

PS: I do not mind to implement the solution myself, but it would be great if I can use a library, and ideally in JAVA.

在k个簇大小相等的组n个点 [英] Group n points in k clusters of equal size

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

在k个簇大小相等的组n个点 [英] Group n points in k clusters of equal size

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭