一种高效的算法,可按每两点之间的距离对聚类中的点进行分组 [英] Efficient algorithm to group points in clusters by distance between every two points

查看:59
本文介绍了一种高效的算法,可按每两点之间的距离对聚类中的点进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种解决以下问题的有效算法:

I am looking for an efficient algorithm for the following problem:

给出2D空间中的一组点,其中每个点由其X和Y坐标定义.需要将此点集划分为一组簇,以便如果两个任意点之间的距离小于某个阈值,则这些点必须属于同一簇:

Given a set of points in 2D space, where each point is defined by its X and Y coordinates. Required to split this set of points into a set of clusters so that if distance between two arbitrary points is less then some threshold, these points must belong to the same cluster:

换句话说,这样的簇是一组相互足够接近"的点.

In other words, such cluster is a set of points which are 'close enough' to each other.

朴素的算法可能看起来像这样:

The naive algorithm may look like this:

  1. R 为群集的结果列表,最初为空
  2. P 为点列表,最初包含所有点
  3. P 中选择随机点,并创建仅包含此点的群集 C .从 P
  4. 删除此点
  5. 对于 P 中的每个点 Pi 4a.对于 C 中的每个点 Pc 4aa.如果距离(Pi,Pc)<阈值,然后将 Pi 添加到 C 并将其从 P
  6. 中删除
  7. 如果在步骤4中向集群 C 添加了至少一个点,请转到步骤4
  8. 添加群集 C 以列出 R .如果 P 不为空,请转到步骤3
  1. Let R be a resulting list of clusters, initially empty
  2. Let P be a list of points, initially contains all points
  3. Pick random point from P and create a cluster C which contains only this point. Delete this point from P
  4. For every point Pi from P 4a. For every point Pc from C 4aa. If distance(Pi, Pc) < threshold then add Pi to C and remove it from P
  5. If at least one point was added to cluster C during the step 4, go to step 4
  6. Add cluster C to list R. if P is not empty, go to step 3

但是,幼稚的方法效率很低.我想知道是否有更好的算法来解决这个问题?

However, naive approach is very inefficient. I wonder if there is a better algorithm for this problem?

P.S.我不知道先验簇的数量

推荐答案

这里有一些经典算法:

  • 分层聚集聚类
  • DBSCAN

您应该阅读和理解.

这篇关于一种高效的算法,可按每两点之间的距离对聚类中的点进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆