如何对点和图进行聚类 [英] How to cluster points and plot

查看:345
本文介绍了如何对点和图进行聚类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在R中使用群集.我是菜鸟,还没有对R进行过很多工作.

I am trying to use clustering in R. I am a rookie and havent worked much with R.

我将地理位置点作为纬度和经度值.我要做的就是使用这些数据找出热点.

I have the geo location points as latitude and longitude values. What I am looking to do is to find out the hotspots using this data.

我正在寻求创建相距600英尺的4个或更多点的群集.

I am looking to create clusters of 4 or more points that are 600 feet apart.

我想获得此类星团的质心并将其绘制出来.

I want to get the centroids of such clusters and plot them.

数据如下:

LATITUDE    LONGITUD
32.70132    -85.52518
34.74251    -86.88351
32.55205    -87.34777
32.64144    -85.35430
34.92803    -87.81506
32.38016    -86.29790
32.42127    -87.08690
...

structure(list(LATITUDE = c(32.70132, 34.74251, 32.55205, 32.64144, 
34.92803, 32.38016, 32.42127, 32.9095, 33.58092, 32.51617, 33.5726, 
33.83251, 34.65639, 34.27694, 33.73851, 33.95132, 31.35445, 34.05263, 
33.37959, 30.50248, 32.31561, 32.66919, 31.75039, 33.56986, 33.27091, 
33.93598, 32.30964, 31.09773, 32.26711, 33.54263, 34.72014, 34.78548, 
30.65705, 31.25939, 31.27647, 30.54322, 31.22416, 33.38549, 33.18338, 
31.16811, 32.38368, 32.36253, 31.14464), LONGITUD = c(-85.52518, 
-86.88351, -87.34777, -85.3543, -87.81506, -86.2979, -87.0869, 
-85.75888, -86.27647, -86.21179, -86.65275, -87.2696, -85.72738, 
-87.71489, -86.48934, -86.29693, -88.22943, -87.55328, -85.31454, 
-87.79342, -86.88108, -86.26669, -88.04425, -86.44631, -87.74383, 
-87.72403, -86.28067, -85.4449, -87.62541, -86.56251, -86.48971, 
-85.59656, -88.24491, -86.60828, -86.18112, -88.22778, -85.63784, 
-86.03297, -87.55456, -85.37719, -86.38047, -86.21579, -86.86606
)), .Names = c("LATITUDE", "LONGITUD"), class = "data.frame", row.names = c(NA, 
-43L))

上述数据框中有30,800个条目(地理位置).我在上面给出了一个示例.

There are 30,800 entries (geo locations) in the above data frame. I have given a sample above.

我不能使用K均值,因为它会产生No.指定的群集数量,但此处并非如此.群集应包含600英尺范围内的4个或更多点.

I cannot use K means as it creates the no. of clusters specified but that is not the case here. Clusters should consist of 4 or more points that are within a distance of some 600ft.

作为第一步,我试图绘制所有纬度和经度点,并了解可视化效果.这样我就可以使用它来检查是否形成了群集图,并且该图看起来是否相似.

Just as an initial step, I tried to plot all the latitude and longitude points and have an idea how the visualization looks like. So that I can use it to check if the plot of clusters formed and this plot look alike.

plot(dbfvar[,1], dbfvar[,2], type="l") #dbfvar is the dataframe having above data.

情节不尽人意.

主要部分是创建聚类并获取它们的质心,并可视化所形成的聚类的质心.

The main part is to create the clusters and obtain the centroids of them, and visualize the centroids of the clusters formed.

P.S. :我不限于使用R,也可以使用python.在继续针对7个此类文件(每个30,800个地理位置)实施该文件之前,我正在为上述问题寻求一个好的解决方案.

P.S. : I am not confined to using R, I can use python as well. I am looking for a good solution for the above problem before I go ahead and implement it over 7 such files (each of 30,800 geo locations.)

推荐答案

分层聚类是一种方法.

首先,您要构建树状图:

First you construct a dendrogram:

dend <- hclust(dist(theData), method="complete")

我在这里使用完全"链接,以便所有组通过最大距离规则"合并.如果我们要确保一组中的所有点至多相距一定距离,这将在以后有用.

I am using "complete" linkage here, so that all that the groups are merged by the maximum-distance "rule". This should be useful later if we want to make sure that all of our points in one group are at most a certain distance apart.

我将距离选择为"2"(因为我不确定如何将纬度和经度转换为英尺.您应该先转换然后选择600而不是2).这是生成的树状图,其切割高度为"2".

I choose the distance of "2" (Because I am not sure how to convert your latitudes and longitudes to feet. You should convert first and then choose 600 instead of 2). Here is the resulting dendrogram with the cutting at height of "2".

plot(dend, hang=-1)
points(c(-100,100), c(2,2), col="red", type="l", lty=2)

现在,每条与红线相交的子树将成为一个簇.

Now each subtree intersected by the red line will become one cluster.

groups <- cutree(theData, h=2) # change "h" here to 600 after converting to feet.

我们可以将它们绘制为散点图,以查看它们的外观:

We can plot them as a scatter plot to see how they look:

plot(theData, col=groups)

很有希望.我们想要的是附近的点成簇.

Promising. The points nearby form clusters which is what we wanted.

让我们添加半径为1的中心和围绕这些中心的圆(以使圆内的最大距离为2):

Let's add centers and circles around those centers with the radius of 1 (so that the max distance within the circle is 2):

G1 <- tapply(theData[,1], groups, mean)  # means of groups
G2 <- tapply(theData[,2], groups, mean)  # ...

library(plotrix)  # for drawing circles
plot(theData, col=groups)
points(G1, G2, col= 1:6, cex=2, pch=19)
for(i in 1:length(G1)) {  # draw circles
    draw.circle(G1[i], G2[i], 1, border=i,lty=3,lwd=3)
}

看起来像在均值周围绘制圆圈并不是捕获聚类中所有点的最佳方法.尽管如此,从视觉上仍可以验证一组中的点之间的最大距离为2.(只需稍微移动一下圆圈即可封装所有点).

Looks like drawing circles around the mean is not the best way to capture all of the points within the cluster. Nevertheless visually it can be verified that maximum distance between the points in one groups is 2. (just try shifting circles a bit to encapsulate all of the points).

这篇关于如何对点和图进行聚类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆