适当的相似性度量为多组2D坐标 [英] Appropriate similarity metrics for multiple sets of 2D coordinates

查看:181
本文介绍了适当的相似性度量为多组2D坐标的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2D的集合坐标集(上一个100K-500K点的规模各一套),我期待的测量1套其他的相似性的最有效方式。我知道usuals的:余弦,杰卡德/谷本等,但我希望一些建议,在任何快速/有效率的衡量相似性,特别是那些可以通过相似性聚类

I have a collection of 2D coordinate sets (on the scale of a 100K-500K points in each set) and I am looking for the most efficient way to measure the similarity of 1 set to the other. I know of the usuals: Cosine, Jaccard/Tanimoto, etc. However I am hoping for some suggestions on any fast/efficient ones to measure similarity, especially ones that can cluster by similarity.

编辑1:图像显示什么,我需要做的。我需要它们的形状/ orientatoin等集群所有的红色,蓝色和绿色。

Edit 1: The image shows what I need to do. I need to cluster all the reds, blues and greens by their shape/orientatoin, etc.

推荐答案

似乎任何解决方案的第一个步骤将是找到的质心或其他参考点,每个形状的,以便它们可以比较不论绝对位置。

It seems that the first step of any solution is going to be to find the centroid, or other reference point, of each shape, so that they can be compared regardless of absolute position.

一种算法,想到将开始在最近的重心点,步行到最近的邻居。比较各组之间的那些邻居(从形心)的偏移量进行比较。保持行走到质心的次最近的邻居,或那些$ P $的最近未已经-相比邻居pviously比较,并保持跟踪两个形状之间的骨料差(也许有效值β)。此外,在这个过程中的每一步计算转动偏移会带来两个形状成最接近的对齐[和镜像是否影响它,以及?]。当你完成,你将有三个值每对集,包括其直接的相似性,它们的相对旋转偏移(如果它们是旋转后接近的比赛大多只有有用的),他们的旋转后的相似性。

One algorithm that comes to mind would be to start at the point nearest the centroid and walk to its nearest neighbors. Compare the offsets of those neighbors (from the centroid) between the sets being compared. Keep walking to the next-nearest neighbors of the centroid, or the nearest not-already-compared neighbors of the ones previously compared, and keep track of the aggregate difference (perhaps RMS?) between the two shapes. Also, at each step of this process calculate the rotational offset that would bring the two shapes into closest alignment [and whether mirroring affects it as well?]. When you are finished you will have three values for every pair of sets, including their direct similarity, their relative rotational offset (mostly only useful if they are close matches after rotation), and their similarity after rotation.

这篇关于适当的相似性度量为多组2D坐标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆