3D聚类算法 [英] 3D clustering Algorithm

查看:175
本文介绍了3D聚类算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题陈述:
我有以下问题:

Problem Statement: I have the following problem:

3D空间中的点数超过十亿。目的是找到在给定距离R内具有最大邻居数的前N个点。另一个条件是,这些前N个点中任意两个点之间的距离必须大于R。这些点的分布不均匀。在空间的某些区域包含很多点是很常见的。

There are more than a billion points in 3D space. The goal is to find the top N points which has largest number of neighbors within given distance R. Another condition is that the distance between any two points of those top N points must be greater than R. The distribution of those points are not uniform. It is very common that certain regions of the space contain a lot of points.

目标:
寻找可以

Goal: To find an algorithm that can scale well to many processors and has a small memory requirement.

思路:
对于这种类型的存储,正常的空间分解是不够的。由于分布不均匀而出现问题。均匀划分点数的不规则空间分解可能会帮助解决这个问题。我真的很感激,如果有人可以阐明如何解决此问题。

Thoughts: Normal spatial decomposition is not sufficient for this kind of problem due to the non-uniform distribution. irregular spatial decomposition that evenly divide the number of points may help us the problem. I will really appreciate that if someone can shed some lights on how to solve this problem.

推荐答案

使用八叉树。对于具有有限值域的3D数据,它可以很好地扩展到庞大的数据集。

Use an Octree. For 3D data with a limited value domain that scales very well to huge data sets.

许多上述方法,例如 localitysensitive hash 近似的版本设计用于更高的 维度,您将无法再进行合理的拆分。

Many of the aforementioned methods such as locality sensitive hashing are approximate versions designed for much higher dimensionality where you can't split sensibly anymore.

在每个级别上分为8个容器(2 ^ d用于d = 3)效果很好。而且由于您可以在单元中的点太少时停下来,并构建一棵更深的树,其中有很多点非常适合您的需求。

Splitting at each level into 8 bins (2^d for d=3) works very well. And since you can stop when there are too few points in a cell, and build a deeper tree where there are a lot of points that should fit your requirements quite well.

有关更多详细信息,请参见Wikipedia:

For more details, see Wikipedia:

https:/ /en.wikipedia.org/wiki/Octree

或者,您可以尝试构建R树。但是R树试图保持平衡,因此很难找到最密集的区域。对于您的特定任务,Octree的缺点实际上很有帮助! R树付出了很多努力来使树的深度在任何地方都相等,以便可以在大约同一时间找到每个点。但是,您只对密集区域感兴趣,这些区域将在Octree中最长的路径上出现,而无需查看实际点!

Alternatively, you could try to build an R-tree. But the R-tree tries to balance, making it harder to find the most dense areas. For your particular task, this drawback of the Octree is actually helpful! The R-tree puts a lot of effort into keeping the tree depth equal everywhere, so that each point can be found at approximately the same time. However, you are only interested in the dense areas, which will be found on the longest paths in the Octree without even having to look at the actual points yet!

这篇关于3D聚类算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆