我将如何对位置的无序列表进行聚类? [英] How would I cluster an unordered list of locations?

查看:88
本文介绍了我将如何对位置的无序列表进行聚类?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可能重复:
用于映射应用程序的聚类算法

Possible Duplicate:
Clustering Algorithm for Mapping Application

我有一个位置无序的列表(包含其坐标).我知道使用Haversine公式来计算两点之间的距离.但是我看过的集群解决方案说我需要先订购列表.位置的正确订购顺序是什么? 我想将彼此相距1米以内的所有位置聚类(即,将所有位置放到一个clusteredLocation对象中),这是否可行而不先进行排序?

I have an unordered List of locations (containing their co-ordinates). I know to use the Haversine formula to calculate the distance between two points. But solutions for clustering I've looked at say I'd need to order the list first. What is the correct ordering for locations? I want to cluster (i.e. put all locations into a single clusteredLocation object) all locations which are within 1 metre of each other, is this feasible without sorting first?

推荐答案

实际上算法要求对点进行排序.这在某种程度上会破坏聚类分析的整个目的.但也许您更想着使用web2.0 的聚合方式?

Actually none of the cluster-analysis algorithms I know requires the points to be ordered. That would somewhat defeat the whole purpose of cluster analysis. But maybe you are more thinking of web2.0 marker-clusterer kind of aggregation?

看看k均值,单链接和DBSCAN.在 Wikipedia上,以及中心文章《聚类分析》 ,都对此进行了很好的描述.这些都不需要订购您的积分.

Have a look at k-means, single-link and DBSCAN. All well described on Wikipedia, with Hub article Cluster Analysis. None of these require your points to be ordered.

请注意,Haversine距离不适用于k均值或平均链接聚类,除非您找到了一种计算均值的方法,该方法可以最大程度地减少方差.如果纬度-经度坐标为-180/+ 180,则不要使用算术平均值. 单链接,完全链接,DBSCAN,OPTICS都应该没问题.

Note that Haversine distance is not appropriate for k-means or average-linkage clustering, unless you find a smart way of computing the mean that minimizes variance. Do not use the arithmetic average if you have the -180/+180 wrap-around of latitude-longitude coordinates. Single-linkage, complete-linkage, DBSCAN, OPTICS all should be fine.

这篇关于我将如何对位置的无序列表进行聚类?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆