使用ggplot2基于人口普查数据绘制地图 [英] Drawing maps based on census data using ggplot2

查看:381
本文介绍了使用ggplot2基于人口普查数据绘制地图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一份要使用ggplot2在旧金山地图上叠加的点的列表。
每个点都是经度,纬度对。
我希望得到的地图位于经度/纬度坐标系中。
我设法重现了Hadley Wickham的绘制多边形shapefile的方向使用他的示例文件。我使用R 2.15.1 for Windows。



但是,我尝试使用从 UScensus2010cdp软件包
这里是我的代码片段:

$ p $ require(rgdal)
require(maptools)
require(ggplot2)
require(sp)
require(plyr)
gpclibPermit()fortify方法需要的
require(UScensus2010)
require(UScensus2010cdp)
data(california.cdp10)
sf< - city(name =san francisco,state =ca)
sf.points = fortify( sf)

我得到以下错误:

 使用名称来定义区域。 
unionSpatialPolygons中的错误(cp,invert(polys)):输入长度不同
另外:警告信息:
在split(as.numeric(row.names(attr)),addNA( attr [,region],TRUE)):
强制引入的新来港元

有人知道:


  1. 对fortify()的region参数有什么好处?

  2. 如果失败了,那么ggplot2可以绘制的旧金山未经转换的纬度/经度坐标的地图数据源?

  3. 或者,我发现这里另一张旧金山地图,其数据已翻译。你能告诉我如何将这些数据翻译成原始的经纬度或者对我的点数进行反向翻译吗?


解决方案

注意:





问题



问题在于 fortify.SpatialPolygonsDataFrame 依赖于将 row.names 转换为数字,并且数据的rownames是标识符。

  ggplot2 ::: fortify.SpatialPolygonsDataFrame 

函数(model,data,region = NULL,...)
{
attr< - as.data.frame(model)
if(is.null(reg )$ {
region < - 名称(attr)[1]
消息(使用,region,来定义区域)
}
polys< - 分割(as.numeric(row.names(attr)),addNA(attr [,
region],TRUE))
cp < - 多边形(模型)
try_require(c (cp,invert(poly))
coords< - 强化(联合)
coords $ order< - 1: nrow(coords)
coords
}

您的情况

  row.names(sf @ data)
## [1]california_586california_590california_616

是您希望用作区域参数的标识符,如 place state name 不能唯一地标识这三个多边形。

 #as.character用于强制因子
lapply(lapply(sf @ data [,c('place ','state','name')],unique),as.character)
## $ place
## [1]67000
##
# #$ state
## [1]06
##
## $ name
## [1]旧金山

作为元素以字母开头的字符向量,强制转换为数字时,变为 NA $ b $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ ## $ [$] NA
##警告信息:
##强制引入$ NA

其中是给出的警告之一

解决方案




  1. 将列定义为rownames

  2. 将row.names设置为 NULL 1:nrow(sf @ data)

    $ b所以..

      sf @ data [['place_id']]<  -  rownames(sf @ data)
    row.names(sf @ data)< - NULL

    #fortify
    sf_ggplot< - fortify(sf,region ='place_id')
    #合并添加原始数据
    sf_ggplot_all< - merge(sf_ggplot,sf @data,by.x ='id',by.y ='place_id')
    #非常基本和无趣的情节
    ggplot(sf_ggplot_all,aes(x = long,y = lat,group = group) ))+
    geom_polygon(aes(fill = pop2000))+
    coord_map()


    I have a list of points that I want to overlay on a map of San Francisco using ggplot2. Each point is a longitude, latitude pair. I want the resulting map to be in a longitude/latitude coordinate system. I managed to reproduce Hadley Wickham's directions for plotting polygon shapefiles using his example file. I am using R 2.15.1 for Windows.

    However, I tried to use cdp files downloaded from the UScensus2010cdp package. Here's my code snippet:

    require("rgdal") 
    require("maptools")
    require("ggplot2")
    require("sp")
    require("plyr")
    gpclibPermit() # required for fortify method
    require(UScensus2010)
    require(UScensus2010cdp)
    data(california.cdp10)
    sf <- city(name = "san francisco", state="ca")
    sf.points = fortify(sf)
    

    I get the following error:

    Using name to define regions.
    Error in unionSpatialPolygons(cp, invert(polys)) : input lengths differ
    In addition: Warning message:
    In split(as.numeric(row.names(attr)), addNA(attr[, region], TRUE)) :
       NAs introduced by coercion
    

    Does anybody know:

    1. What is a good value to give to the region parameter of fortify()?
    2. If that fails, a source of map data with untransformed lat/long coordinates for San Francisco that ggplot2 can draw?
    3. Alternatively, I found here another map of San Francisco, whose data is translated. Can you tell me how to either translate this data to raw lat/long or make the reverse translation for my set of points?

    解决方案

    note:

    The issue

    The issue arises from the fact that fortify.SpatialPolygonsDataFrame relies on converting the row.names to numeric, and the rownames of your data are the identifiers.

    ggplot2:::fortify.SpatialPolygonsDataFrame 
    
    function (model, data, region = NULL, ...) 
    {
        attr <- as.data.frame(model)
        if (is.null(region)) {
            region <- names(attr)[1]
            message("Using ", region, " to define regions.")
        }
        polys <- split(as.numeric(row.names(attr)), addNA(attr[, 
            region], TRUE))
        cp <- polygons(model)
        try_require(c("gpclib", "maptools"))
        unioned <- unionSpatialPolygons(cp, invert(polys))
        coords <- fortify(unioned)
        coords$order <- 1:nrow(coords)
        coords
    }
    

    In your case

    row.names(sf@data)
    ## [1] "california_586" "california_590" "california_616"
    

    are the identifiers you wish to use as the region parameters, as place state and name do not uniquely identify the three polygons.

    # as.character used to coerce from factor
    lapply(lapply(sf@data[,c('place','state','name')], unique), as.character)
    ## $place
    ## [1] "67000"
    ## 
    ## $state
    ## [1] "06"
    ## 
    ## $name
    ## [1] "San Francisco"
    

    As a character vector where the elements begin with alphabetic characters, when coerced to numeric, it becomes NA

    as.numeric(rownames(sf@data))
    ## [1] NA NA NA
    ## Warning message:
    ## NAs introduced by coercion
    

    Which is one of the warnings given

    Solution

    1. Define a column to be the rownames
    2. Set the row.names to NULL or 1:nrow(sf@data)

    So..

    # rownames
    sf@data[['place_id']] <- rownames(sf@data)
    row.names(sf@data) <- NULL
    
    # fortify
    sf_ggplot <- fortify(sf, region = 'place_id')
    # merge to add the original data
    sf_ggplot_all <- merge(sf_ggplot, sf@data, by.x = 'id', by.y = 'place_id')
    # very basic and uninteresting plot
    ggplot(sf_ggplot_all,aes(x=long,y=lat, group = group)) + 
      geom_polygon(aes(fill =pop2000)) + 
      coord_map()
    

    这篇关于使用ggplot2基于人口普查数据绘制地图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆