3D聚类算法 [英] 3D clustering Algorithm

查看:33
本文介绍了3D聚类算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题陈述:我有以下问题:

3D 空间中有超过 10 亿个点.目标是找到在给定距离R内具有最多邻居数的前N个点.另一个条件是前N个点中任意两点之间的距离必须大于R.这些点的分布是不均匀的.空间的某些区域包含很多点是很常见的.

There are more than a billion points in 3D space. The goal is to find the top N points which has largest number of neighbors within given distance R. Another condition is that the distance between any two points of those top N points must be greater than R. The distribution of those points are not uniform. It is very common that certain regions of the space contain a lot of points.

目标:寻找一种可以很好地扩展到许多处理器并且内存要求很小的算法.

Goal: To find an algorithm that can scale well to many processors and has a small memory requirement.

想法:由于分布不均匀,正态空间分解不足以解决此类问题.均匀划分点数的不规则空间分解可能会帮助我们解决问题.如果有人能就如何解决这个问题提供一些启发,我将不胜感激.

Thoughts: Normal spatial decomposition is not sufficient for this kind of problem due to the non-uniform distribution. irregular spatial decomposition that evenly divide the number of points may help us the problem. I will really appreciate that if someone can shed some lights on how to solve this problem.

推荐答案

使用 八叉树.对于具有有限值域的 3D 数据,可以很好地扩展到庞大的数据集.

Use an Octree. For 3D data with a limited value domain that scales very well to huge data sets.

上述许多方法,例如局部敏感散列,都是为更高维度而设计的近似版本,您无法再明智地进行拆分.

Many of the aforementioned methods such as locality sensitive hashing are approximate versions designed for much higher dimensionality where you can't split sensibly anymore.

在每个级别拆分为 8 个 bin(d=3 时为 2^d)效果很好.由于您可以在单元格中的点太少时停止,并构建一个更深的树,其中有很多点应该非常适合您的要求.

Splitting at each level into 8 bins (2^d for d=3) works very well. And since you can stop when there are too few points in a cell, and build a deeper tree where there are a lot of points that should fit your requirements quite well.

有关详细信息,请参阅维基百科:

For more details, see Wikipedia:

https://en.wikipedia.org/wiki/Octree

或者,您可以尝试构建 R 树.但是 R 树试图平衡,因此更难找到最密集的区域.对于您的特定任务,八叉树的这个缺点实际上很有帮助!R-tree 付出了很多努力来保持树的深度在各处相等,以便可以大约在同一时间找到每个点.但是,您只对密集区域感兴趣,这些区域可以在八叉树中最长的路径上找到,甚至无需查看实际点!

Alternatively, you could try to build an R-tree. But the R-tree tries to balance, making it harder to find the most dense areas. For your particular task, this drawback of the Octree is actually helpful! The R-tree puts a lot of effort into keeping the tree depth equal everywhere, so that each point can be found at approximately the same time. However, you are only interested in the dense areas, which will be found on the longest paths in the Octree without even having to look at the actual points yet!

这篇关于3D聚类算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆