多组二维坐标的适当相似度度量 [英] Appropriate similarity metrics for multiple sets of 2D coordinates

查看:34
本文介绍了多组二维坐标的适当相似度度量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组 2D 坐标集(每组 100K-500K 点的规模),我正在寻找最有效的方法来衡量一组与另一组的相似性.我知道通常的情况:余弦、Jaccard/Tanimoto 等.但是我希望就任何快速/有效的测量相似性的方法提供一些建议,尤其是那些可以通过相似性进行聚类的方法.

I have a collection of 2D coordinate sets (on the scale of a 100K-500K points in each set) and I am looking for the most efficient way to measure the similarity of 1 set to the other. I know of the usuals: Cosine, Jaccard/Tanimoto, etc. However I am hoping for some suggestions on any fast/efficient ones to measure similarity, especially ones that can cluster by similarity.

编辑 1:图像显示了我需要做的事情.我需要按形状/方向等将所有红色、蓝色和绿色聚类.

Edit 1: The image shows what I need to do. I need to cluster all the reds, blues and greens by their shape/orientatoin, etc.

替代文字 http://img402.imageshack.us/img402/8121/curves.png

推荐答案

似乎任何解决方案的第一步都是找到每个形状的质心或其他参考点,以便可以比较它们不管绝对位置.

It seems that the first step of any solution is going to be to find the centroid, or other reference point, of each shape, so that they can be compared regardless of absolute position.

想到的一种算法是从离质心最近的点开始,然后走到最近的邻居.比较被比较的集合之间的那些邻居(从质心)的偏移量.继续走到质心的下一个最近的邻居,或者之前比较过的最近的尚未比较的邻居,并跟踪两个形状之间的总差异(可能是 RMS?).此外,在此过程的每个步骤中,计算将使两个形状最接近对齐的旋转偏移量 [以及镜像是否也会影响它?].完成后,每对集合将有三个值,包括它们的直接相似性、它们的相对旋转偏移(通常仅在它们在旋转后紧密匹配时才有用)以及它们在旋转后的相似性.

One algorithm that comes to mind would be to start at the point nearest the centroid and walk to its nearest neighbors. Compare the offsets of those neighbors (from the centroid) between the sets being compared. Keep walking to the next-nearest neighbors of the centroid, or the nearest not-already-compared neighbors of the ones previously compared, and keep track of the aggregate difference (perhaps RMS?) between the two shapes. Also, at each step of this process calculate the rotational offset that would bring the two shapes into closest alignment [and whether mirroring affects it as well?]. When you are finished you will have three values for every pair of sets, including their direct similarity, their relative rotational offset (mostly only useful if they are close matches after rotation), and their similarity after rotation.

这篇关于多组二维坐标的适当相似度度量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆