汇总ggplot用法的纬度,经度和计数数据 [英] Summarizing Latitude, Longitude, and Counts Data for ggplot Usage

查看:236
本文介绍了汇总ggplot用法的纬度,经度和计数数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经以纬度,经度和计数格式提供了一些客户数据。我需要创建一个ggplot热图的所有数据都存在,但我不知道如何将它放入ggplot需要的格式。



我试图汇总数据总数在0.01 Lat和0.01 Lon块(典型热图)之内,我本能地认为是tapply。根据需要,这会按块大小创建一个很好的总结,但格式不正确。此外,我真的希望将空的Lat或Lon块的值作为零包含在内,即使这里没有任何东西,否则热图最终会出现斑点和奇数。



我已经创建了一部分数据供您在以下代码中参考:

 #m是提供的数据矩阵
m = matrix(c(44.9591051,44.984884,44.984884,44.9811399,
44.9969096,44.990894,44.9797023,44.983334 ,
-93.3120017,-93.297668,-93.297668,-93.2993524,
-93.2924484,-93.282462,-93.2738911,-93.26667,
69,147,137,22,68,198,35,138),nrow = 8, (m)< -c(Lat,Lon,Count)
m < - as.data.frame(m)
s = as .data.frame((tapply(m $ Count,list(round(m $ Lon,2),round(m $ Lat,2)),sum)))
s [is.na(s)]< ; - 0

#数据框s包含所有数据,但不完全是所需的格式...
#首先,它有一列r的每个纬度,而不是Lon
#的一列和Lat的一列,其次,它需要具有0作为没有其他数据的
#Lat / Lon对的条目数据。实际上,当其中一个其他条目的Lat或Lon匹配时,只有零
#...如果
#没有特定Lat或Lon值的条目,则没有任何条目全部是
#报告。

desired.format =矩阵(c(44.96,44.96,44.96,44.96,44.96,
44.97,44.97,44.97,44.97,44.97,44.98,44.98,44.98,
44.98,44.98,44.99,44.99,44.99,44.99,44.99,45,45,
45,45,45,-93.31,-93.3,-93.29,-93.28,-93.27,-93.31,
-93.39,-93.29,-93.28,-93.27,-93.31,-93.3,-93.29,
-93.28,-93.27,-93.31,-93.3,-93.29,-93.28,-93.27,
-93.31,-93.3,-93.29,-93.28,-93.27,69,0,0,0,0,0,0,
0,0,0,0,306,0,0,173,0,0,0,198 ,0,0,0,68,0,0),
nrow = 25,ncol = 3)

colnames(desired.format)< -c(Lat, Lon,Count)
desired.format< - as.data.frame(desired.format)

minneapolis = get_map(location =minneapolis,mn,zoom = 12 )
ggmap(minneapolis)+ geom_tile(data = desired.format,aes(x = Lon,y = Lat,alpha = Count),fill =red)


解决方案

这是一个geom_hex和stat_density2d戳。通过截断坐标来制作箱子的想法让我有些不安。

你有什么是计数数据,给出经纬度,这意味着理想情况下你需要一个重量参数,但是据我所知,不是用geom_hex实现的。相反,我们通过为每个count变量重复行来破解它,类似于这里

  ## hack工作重复记录到全部计数
m <-as.data.frame(m) (m,m [rep(1:nrow(m),Count),])


## stat_density2d
ggplot(m_long ,aes(Lat,Lon))+
stat_density2d(aes(alpha = .. level ..,fill = .. level ..),size = 2,
bins = 10,geom = c(多边形,轮廓))+
scale_fill_gradient(low =blue,high =red)+
geom_density2d(color =black,bins = 10)+
geom_point(data = m_long)


## geom_hex alternative
bins = 6
ggplot(m_long,aes(Lat,Lon))+
geom_hex (bins = bins)+
coord_equal(ratio = 1/1)+
scale_fill_gradient(low =blue,high =red)+
geom_point(data = m_long,position = jitter)+
stat_binhex(aes(label = .. count。 。,size = .. count .. *。5),size = 3.5,geom =text,bins = bins,color =white)

这些分别产生以下内容:

和分档版本:
p>

编辑:



使用底图:

  map + 
stat_density2d(data = m_long,aes(x = Lon,y = Lat,
alpha = .. level ..,fill = .. level ..), b $ b bins = 10,
geom = c(polygon,contour),
inherit.aes = FALSE)+
scale_fill_gradient( low =blue,high =red)+
geom_density2d(data = m_long,aes(x = Lon,y = Lat),
color =black,bins = 10,继承。 aes = FALSE)+
geom_point(data = m_long,aes(x = Lon,y = Lat),inherit.aes = FALSE)


##和hexbin映射...

map + #ggplot(m_long,aes(Lat,Lon))+
geom_hex(bins = bins,data = m_long,aes(x = Lon,y = Lat),alpha = .5,
inherit.aes = FALSE)+
geom_point(data = m_long,aes(x = Lon,y = Lat),
inherit.aes = FALSE,position =jitter)+
scale_fill_gradient(low =blue,high =red)



I have been provided with some customer data in Latitude, Longitude, and Counts format. All the data I need to create a ggplot heatmap is present, but I do not know how to put it into the format ggplot requires.

I am trying to aggregate the data by total counts within 0.01 Lat and 0.01 Lon blocks (typical heatmap), and I instinctively thought "tapply". This creates a nice summary by block size, as desired, but the format is wrong. Furthermore, I would really like to have empty Lat or Lon block values be included as zeroes, even if there is nothing there... otherwise the heatmap ends up looking streaky and odd.

Your help is greatly appreciated.

I have created a subset of my data for your reference in the code below:

# m is the matrix of data provided
m = matrix(c(44.9591051,44.984884,44.984884,44.9811399,
           44.9969096,44.990894,44.9797023,44.983334,
          -93.3120017,-93.297668,-93.297668,-93.2993524,
          -93.2924484,-93.282462,-93.2738911,-93.26667,
          69,147,137,22,68,198,35,138), nrow=8, ncol=3) 
colnames(m) <- c("Lat", "Lon", "Count")
m <- as.data.frame(m)
s = as.data.frame((tapply(m$Count, list(round(m$Lon,2), round(m$Lat,2)), sum)))
s[is.na(s)] <- 0

# Data frame "s" has all the data, but not exactly in the format desired...
# First, it has a column for each latitude, instead of one column for Lon
# and one for Lat, and second, it needs to have 0 as the entry data for 
# Lat / Lon pairs that have no other data. As it is, there are only zeroes
# when one of the other entries has a Lat or Lon that matches... if there
# are no entries for a particular Lat or Lon value, then nothing at all is
# reported.

desired.format = matrix(c(44.96,44.96,44.96,44.96,44.96,
    44.97,44.97,44.97,44.97,44.97,44.98,44.98,44.98,
    44.98,44.98,44.99,44.99,44.99,44.99,44.99,45,45,
    45,45,45,-93.31,-93.3,-93.29,-93.28,-93.27,-93.31,
    -93.3,-93.29,-93.28,-93.27,-93.31,-93.3,-93.29,
    -93.28,-93.27,-93.31,-93.3,-93.29,-93.28,-93.27,
    -93.31,-93.3,-93.29,-93.28,-93.27,69,0,0,0,0,0,0,
    0,0,0,0,306,0,0,173,0,0,0,198,0,0,0,68,0,0),
    nrow=25, ncol=3)

colnames(desired.format) <- c("Lat", "Lon", "Count")
desired.format <- as.data.frame(desired.format)

minneapolis = get_map(location = "minneapolis, mn", zoom = 12)
ggmap(minneapolis) + geom_tile(data = desired.format, aes(x = Lon, y = Lat, alpha = Count), fill="red")

解决方案

Here is a stab with geom_hex and stat_density2d. The idea of making bins by truncating coordinates makes me a bit uneasy.

What you have is count data, with lat/longs given, which means ideally you would need a weight parameter, but that is as far as I know not implemented with geom_hex. Instead, we hack it by repeating rows per the count variable, similar to the approach here.

  ## hack job to repeat records to full count
  m<-as.data.frame(m)
  m_long <- with(m, m[rep(1:nrow(m), Count),])


  ## stat_density2d
  ggplot(m_long, aes(Lat, Lon)) + 
  stat_density2d(aes(alpha=..level.., fill=..level..), size=2, 
                 bins=10, geom=c("polygon","contour")) + 
  scale_fill_gradient(low = "blue", high = "red") +
  geom_density2d(colour="black", bins=10) +
  geom_point(data = m_long)


  ## geom_hex alternative
  bins=6
  ggplot(m_long, aes(Lat, Lon)) + 
  geom_hex(bins=bins)+
  coord_equal(ratio = 1/1)+
  scale_fill_gradient(low = "blue", high = "red") +
  geom_point(data = m_long,position = "jitter")+
  stat_binhex(aes(label=..count..,size=..count..*.5), size=3.5,geom="text", bins=bins, colour="white")

These, respectively, produce the following: And the binned version:

EDIT:

With basemap:

map + 
  stat_density2d(data = m_long, aes(x = Lon, y = Lat,
alpha=..level.., fill=..level..), 
                 size=2, 
                 bins=10, 
                 geom=c("polygon","contour"),
                 inherit.aes=FALSE) + 
  scale_fill_gradient(low = "blue", high = "red") +
  geom_density2d(data = m_long, aes(x = Lon, y=Lat),
                 colour="black", bins=10,inherit.aes=FALSE) +
  geom_point(data = m_long, aes(x = Lon, y=Lat),inherit.aes=FALSE)


## and the hexbin map...

map + #ggplot(m_long, aes(Lat, Lon)) + 
  geom_hex(bins=bins,data = m_long, aes(x = Lon, y = Lat),alpha=.5,
                 inherit.aes=FALSE) + 
  geom_point(data = m_long, aes(x = Lon, y=Lat),
             inherit.aes=FALSE,position = "jitter")+
  scale_fill_gradient(low = "blue", high = "red")

这篇关于汇总ggplot用法的纬度,经度和计数数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆