一种高效的算法,可按每两点之间的距离对聚类中的点进行分组 [英] Efficient algorithm to group points in clusters by distance between every two points
问题描述
我正在寻找一种解决以下问题的有效算法:
I am looking for an efficient algorithm for the following problem:
给出2D空间中的一组点,其中每个点由其X和Y坐标定义.需要将此点集划分为一组簇,以便如果两个任意点之间的距离小于某个阈值,则这些点必须属于同一簇:
Given a set of points in 2D space, where each point is defined by its X and Y coordinates. Required to split this set of points into a set of clusters so that if distance between two arbitrary points is less then some threshold, these points must belong to the same cluster:
换句话说,这样的簇是一组相互足够接近"的点.
In other words, such cluster is a set of points which are 'close enough' to each other.
朴素的算法可能看起来像这样:
The naive algorithm may look like this:
- 让 R 为群集的结果列表,最初为空
- 让 P 为点列表,最初包含所有点
- 从 P 中选择随机点,并创建仅包含此点的群集 C .从 P 删除此点
- 对于 P 中的每个点 Pi 4a.对于 C 中的每个点 Pc 4aa.如果距离(Pi,Pc)<阈值,然后将 Pi 添加到 C 并将其从 P 中删除
- 如果在步骤4中向集群 C 添加了至少一个点,请转到步骤4
- 添加群集 C 以列出 R .如果 P 不为空,请转到步骤3
- Let R be a resulting list of clusters, initially empty
- Let P be a list of points, initially contains all points
- Pick random point from P and create a cluster C which contains only this point. Delete this point from P
- For every point Pi from P 4a. For every point Pc from C 4aa. If distance(Pi, Pc) < threshold then add Pi to C and remove it from P
- If at least one point was added to cluster C during the step 4, go to step 4
- Add cluster C to list R. if P is not empty, go to step 3
但是,幼稚的方法效率很低.我想知道是否有更好的算法来解决这个问题?
However, naive approach is very inefficient. I wonder if there is a better algorithm for this problem?
P.S.我不知道先验簇的数量
推荐答案
这里有一些经典算法:
- 分层聚集聚类
- DBSCAN
您应该阅读和理解.
这篇关于一种高效的算法,可按每两点之间的距离对聚类中的点进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!