地图聚类算法 [英] Map Clustering Algorithm

查看:1662
本文介绍了地图聚类算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前的code是pretty的快,但我需要使它变得更快,所以我们可以容纳更多的标记。有什么建议?

注:

  • 的code运行速度最快的SQL语句时被标记的名字排序 - 这本身是聚类标记的一个很偏的工作(标记的名称,在相同的位置通常是,但并非总是如此相似)<。 / LI>
  • 在我不能pre簇标记,因为它们可以动态地搜索和过滤。
  • 在我试过的基于网格的聚类 - 但结果往往不是很好
  • 我知道簇稍微倾斜在墨卡托投影。
  • 在我没有兴趣在商业集群服务。

在code:

  $ singleMarkers =阵列();
$ clusterMarkers =阵列();

而(计数($标记)){
    $标记= array_pop($标记);
    $集群=阵列();

    //比较的指标,所有剩余的标记。
    的foreach($标记为$关键=&GT; $ compareMarker){
        //该函数返回两个标记之间的距离,在一个定义的缩放级别。
        $像素= pixelDistance($标记['纬度'],$标记['LNG'],$ compareMarker ['纬度'],$ compareMarker ['LNG'],$当zoomLevel =);
        //如果两个指标都低于规定的距离更近,从数组中删除compareMarker,并添加到集群。
        如果($像素&LT; $距离){
            取消设置($标记[$关键]);
            $集群[] = $ compareMarker;
        }
    }

    //如果一个标记被添加到集群中,还加入我们被比作标记。
    如果(计数($簇)大于0){
        $集群[] = $标记;
        $ clusterMarkers [] = $集群;
    } 其他 {
        $ singleMarkers [] = $标记;
    }
}

功能pixelDistance($ LAT1,$ lon1,$ LAT2,$ lon2,$变焦){
    $ X1 = $ lon1 *千万; //这是我做过什么来补偿使用而不是像素纬度/经度值。
    $ Y1 = $ LAT1 *千万;
    $ X2 = $ lon2 *千万;
    $ Y2 = $ LAT2 *千万;

    返回的sqrt(POW(($ $ X1- X2),2)+ POW(($ $ Y1- Y2),2))&GT;&GT; (21  -  $变焦); // 21是最大缩放级别
}
 


更新

下面是目前的code:

  $ singleMarkers =阵列();
$ clusterMarkers =阵列();

//要包含在一个簇标记间的最小距离,在差异。缩放级别
$距离=(10000000&GT;&GT; $ ZOOM)/ 100000;

//循环,直到所有的标志进行了比较。
而(计数($标记)){
    $标记= array_pop($标记);
    $集群=阵列();

    //对这些留给所有标记进行比较。
    的foreach($标记为$关键=&GT; $目标){
        $像素= ABS($标记['纬度']  -  $目标['纬度'])+ ABS($标记['LNG']  -  $目标['LNG']);

        //如果这两个指标更接近比给定距离删除目标标记的阵列,并将其添加到集群。
        如果($像素&LT; $距离){
            取消设置($标记[$关键]);
            $集群[] = $的目标;
        }
    }

    //如果一标志已被添加到集群,还添加所述一个我们比较。
    如果(计数($簇)大于0){
        $集群[] = $标记;
        $ clusterMarkers [] = $集群;
    } 其他 {
        $ singleMarkers [] = $标记;
    }
}
 

解决方案

你真的需要计算欧氏距离?如果你只是比较距离的相对大小,你也许可以逃脱使用曼哈顿距离,这将节省你们两个调用 POW()和一个的sqrt()

 函数pixelDistance($ LAT1,$ lon1,$ LAT2,$ lon2,$变焦){
    $ X1 = $ lon1 *千万; //这是我做过什么来补偿使用而不是像素纬度/经度值。
    $ Y1 = $ LAT1 *千万;
    $ X2 = $ lon2 *千万;
    $ Y2 = $ LAT2 *千万;

    返程($ X 1  -  $ 2次)+($ Y1〜$ Y2)&GT;&GT; (21  -  $变焦);
}
 

不知道,如果你需要的>>(21 - $变焦)位的计算,所以我离开它,但除非你真正需要使用计算。其他地方的距离值,你也许可以逃脱只需使用纬度/经度直接(无需通过任何繁殖),并采取曼哈顿距离,假设你pre-计算 $距离以配合这一措施,这将是一个很大的计算方面比便宜强迫所有的距离,以配合单位和 $距离幅度。

编辑:当我在去年的研究这个问题,我发现一些有用的东西在维基百科 - 是的,它可以发生; - )

我也极力推荐这本书的 集体智慧编程:构建智能的Web 2.0应用程序 其进入聚集在巨大的深度,不仅适用于地理数据,而且数据分析等领域。

My current code is pretty quick, but I need to make it even faster so we can accommodate even more markers. Any suggestions?

Notes:

  • The code runs fastest when the SQL statement is ordered by marker name - which itself does a very partial job of clustering the markers (the names of markers in the same location are often, but not always similar).
  • I can't pre-cluster the markers, because they can be dynamically searched and filtered.
  • I've tried grid-based clustering - but the results often aren't very nice.
  • I know that the clusters are slightly skewed on a Mercator projection.
  • I'm not interested in a commercial clustering service.

The code:

$singleMarkers = array();
$clusterMarkers = array();

while (count($markers)) {
    $marker  = array_pop($markers);
    $cluster = array();

    // Compare marker against all remaining markers.
    foreach ($markers as $key => $compareMarker) {
        // This function returns the distance between two markers, at a defined zoom level.
        $pixels = pixelDistance($marker['lat'], $marker['lng'], $compareMarker['lat'], $compareMarker['lng'], $zoomLevel);
        // If two markers are closer than defined distance, remove compareMarker from array and add to cluster.
        if ($pixels < $distance) {
            unset($markers[$key]);
            $cluster[] = $compareMarker;
        }
    }

    // If a marker was added to cluster, also add the marker we were comparing to.
    if (count($cluster) > 0) {
        $cluster[] = $marker;
        $clusterMarkers[] = $cluster;
    } else {
        $singleMarkers[] = $marker;
    }
}

function pixelDistance($lat1, $lon1, $lat2, $lon2, $zoom) {
    $x1 = $lon1*10000000; //This is what I did to compensate for using lat/lon values instead of pixels.
    $y1 = $lat1*10000000;
    $x2 = $lon2*10000000;
    $y2 = $lat2*10000000;

    return sqrt(pow(($x1-$x2),2) + pow(($y1-$y2),2)) >> (21 - $zoom); //21 is the max zoom level
}


UPDATE

Here's the current code:

$singleMarkers = array();
$clusterMarkers = array();

// Minimum distance between markers to be included in a cluster, at diff. zoom levels
$DISTANCE = (10000000 >> $ZOOM) / 100000;

// Loop until all markers have been compared.
while (count($markers)) {
    $marker  = array_pop($markers);
    $cluster = array();

    // Compare against all markers which are left.
    foreach ($markers as $key => $target) {
        $pixels = abs($marker['lat']-$target['lat']) + abs($marker['lng']-$target['lng']);

        // If the two markers are closer than given distance remove target marker from array and add it to cluster.
        if ($pixels < $DISTANCE) {
            unset($markers[$key]);
            $cluster[] = $target;
        }
    }

    // If a marker has been added to cluster, add also the one we were comparing to.
    if (count($cluster) > 0) {
        $cluster[] = $marker;
        $clusterMarkers[] = $cluster;
    } else {
        $singleMarkers[] = $marker;
    }
}

解决方案

Do you actually need to calculate the Euclidean distance? If you are just comparing relative magnitudes of distances, you can probably get away with using the Manhattan distance, which will save you two calls to pow() and one to sqrt():

function pixelDistance($lat1, $lon1, $lat2, $lon2, $zoom) {
    $x1 = $lon1*10000000; //This is what I did to compensate for using lat/lon values instead of pixels.
    $y1 = $lat1*10000000;
    $x2 = $lon2*10000000;
    $y2 = $lat2*10000000;

    return ($x1-$x2) + ($y1-$y2) >> (21 - $zoom);
}

Not sure if you need the >> (21 - $zoom) bit for your calculations, so I left it in. But unless you actually need to use the calculated distance values elsewhere, you can probably get away with just using the latitude/longitude directly (no need to multiply by anything) and taking the Manhattan distance, assuming you pre-calculate $distance to fit in with that measure, which will be a lot cheaper in computational terms than coercing all the distances to fit in with the units and magnitude of $distance.

EDIT: When I was researching this problem last year, I found some useful stuff on Wikipedia - yes, it can happen ;-)

I can also highly recommend the book Programming Collective Intelligence: Building Smart Web 2.0 Applications which goes into clustering in great depth, as applied not only to geographical data but also to other areas of data analysis.

这篇关于地图聚类算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆