在最小尺寸的组中对项目进行最佳分组/聚类 [英] Optimal grouping/clustering of items in groups with minimum size

查看:100
本文介绍了在最小尺寸的组中对项目进行最佳分组/聚类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种解决以下问题的算法:




  • 给出:一组项目及其相似性矩阵。

    li>
  • 目标:将这些项目分组为最小尺寸为 m
  • 的集群
  • 条件:


    • 数据集中没有类似簇的结构,如
      和类似的优化问题。



      在没有任何其他约束的情况下,您的问题也未得到明确说明。



      为什么不尝试贪婪的启发式方法(用于背包之类的问题)。随机选择任意点,添加足够的点以满足最小尺寸限制。



      然后从中选择最远的点,然后再次添加足够的点以满足最小尺寸。重复(使用距离总和进行排名),直到您不能再满足最小大小为止。然后将每个剩余点添加到最近的群集中。最后,只要满足最小大小,优化程序是否会将移动点传递给其他集群?


      I am looking for an algorithm that solves the following problem:

      • Given: a set of items and their similarity matrix.
      • Goal: group these items in "clusters" of minimum size m
      • Conditions:
        • There are no cluster-like structures in the dataset, as shown in Figure 1
        • Anyway, the items in a group should be similar to each other. Thus, the global similarity would be high.

      The motivation is not to identify good clusters but to split a dataset into groups of high similarity and of minimum size. Partitioning around medoids does not work out-of-the box, it would produce a lot of 1-item-clusters. Hierarchical approaches (AGNES, DIANA) does not help either.

      This problem is someway similar to Stable Marriage problems: one tries to rank the neighbored items by similarity. But here, there are at least m items in one group / one marriage.

      Thanks in advance!

      解决方案

      This is not clustering then. A clustering algorithm should tell your that there are no clusters there. Your task sounds more like bin packing, knapsack and similar optimization problems to me.

      Without any further constraints, your problem is also underspecified.

      Why don't you try a greedy heuristic (which is common to use for knapsack like problems). Choose any point at random, add enough points to satisfy your minimum size constraint.

      Then choose the furthest point from this, and add again enough points to satisfy your minimum size. Repeat (using sum of distances for ranking), until you no longer can satisfy the minimum size. Then add every remaining point to the nearest cluster each. Finally, do an optimization pass moving points to other clusters as long as the minimum size is satisfied?

      这篇关于在最小尺寸的组中对项目进行最佳分组/聚类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆