使用ggplot2基于人口普查数据绘制地图 [英] Drawing maps based on census data using ggplot2
问题描述
我有一份要使用ggplot2在旧金山地图上叠加的点的列表。
每个点都是经度,纬度对。
我希望得到的地图位于经度/纬度坐标系中。
我设法重现了Hadley Wickham的绘制多边形shapefile的方向使用他的示例文件。我使用R 2.15.1 for Windows。
但是,我尝试使用从 UScensus2010cdp软件包。
这里是我的代码片段:
$ p $ require(rgdal)
require(maptools)
require(ggplot2)
require(sp)
require(plyr)
gpclibPermit()fortify方法需要的
require(UScensus2010)
require(UScensus2010cdp)
data(california.cdp10)
sf< - city(name =san francisco,state =ca)
sf.points = fortify( sf)
我得到以下错误:
使用名称来定义区域。
unionSpatialPolygons中的错误(cp,invert(polys)):输入长度不同
另外:警告信息:
在split(as.numeric(row.names(attr)),addNA( attr [,region],TRUE)):
强制引入的新来港元
有人知道:
- 对fortify()的region参数有什么好处?
- 如果失败了,那么ggplot2可以绘制的旧金山未经转换的纬度/经度坐标的地图数据源?
- 或者,我发现这里另一张旧金山地图,其数据已翻译。你能告诉我如何将这些数据翻译成原始的经纬度或者对我的点数进行反向翻译吗?
注意:
- 无法访问
UScensus2010cdp
,所以我使用UScensus2000cpd
,它复制错误。 / li>
问题
问题在于 fortify.SpatialPolygonsDataFrame
依赖于将 row.names
转换为数字,并且数据的rownames是标识符。
ggplot2 ::: fortify.SpatialPolygonsDataFrame
函数(model,data,region = NULL,...)
{
attr< - as.data.frame(model)
if(is.null(reg )$ {
region < - 名称(attr)[1]
消息(使用,region,来定义区域)
}
polys< - 分割(as.numeric(row.names(attr)),addNA(attr [,
region],TRUE))
cp < - 多边形(模型)
try_require(c (cp,invert(poly))
coords< - 强化(联合)
coords $ order< - 1: nrow(coords)
coords
}
您的情况
row.names(sf @ data)
## [1]california_586california_590california_616
是您希望用作区域参数的标识符,如 place
state
和 name
不能唯一地标识这三个多边形。
#as.character用于强制因子
lapply(lapply(sf @ data [,c('place ','state','name')],unique),as.character)
## $ place
## [1]67000
##
# #$ state
## [1]06
##
## $ name
## [1]旧金山
作为元素以字母开头的字符向量,强制转换为数字时,变为 NA
$ b $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ ## $ [$] NA
##警告信息:
##强制引入$ NA
其中是给出的警告之一
解决方案
- 将列定义为rownames
- 将row.names设置为
NULL
或1:nrow(sf @ data)
$ b所以..
sf @ data [['place_id']]< - rownames(sf @ data)
row.names(sf @ data)< - NULL
#fortify
sf_ggplot< - fortify(sf,region ='place_id')
#合并添加原始数据
sf_ggplot_all< - merge(sf_ggplot,sf @data,by.x ='id',by.y ='place_id')
#非常基本和无趣的情节
ggplot(sf_ggplot_all,aes(x = long,y = lat,group = group) ))+
geom_polygon(aes(fill = pop2000))+
coord_map()
I have a list of points that I want to overlay on a map of San Francisco using ggplot2. Each point is a longitude, latitude pair. I want the resulting map to be in a longitude/latitude coordinate system. I managed to reproduce Hadley Wickham's directions for plotting polygon shapefiles using his example file. I am using R 2.15.1 for Windows.
However, I tried to use cdp files downloaded from the UScensus2010cdp package. Here's my code snippet:
require("rgdal") require("maptools") require("ggplot2") require("sp") require("plyr") gpclibPermit() # required for fortify method require(UScensus2010) require(UScensus2010cdp) data(california.cdp10) sf <- city(name = "san francisco", state="ca") sf.points = fortify(sf)
I get the following error:
Using name to define regions. Error in unionSpatialPolygons(cp, invert(polys)) : input lengths differ In addition: Warning message: In split(as.numeric(row.names(attr)), addNA(attr[, region], TRUE)) : NAs introduced by coercion
Does anybody know:
- What is a good value to give to the region parameter of fortify()?
- If that fails, a source of map data with untransformed lat/long coordinates for San Francisco that ggplot2 can draw?
- Alternatively, I found here another map of San Francisco, whose data is translated. Can you tell me how to either translate this data to raw lat/long or make the reverse translation for my set of points?
解决方案note:
- unable to access
UScensus2010cdp
, so am usingUScensus2000cpd
which replicates the error.
The issue
The issue arises from the fact that
fortify.SpatialPolygonsDataFrame
relies on converting therow.names
to numeric, and the rownames of your data are the identifiers.ggplot2:::fortify.SpatialPolygonsDataFrame function (model, data, region = NULL, ...) { attr <- as.data.frame(model) if (is.null(region)) { region <- names(attr)[1] message("Using ", region, " to define regions.") } polys <- split(as.numeric(row.names(attr)), addNA(attr[, region], TRUE)) cp <- polygons(model) try_require(c("gpclib", "maptools")) unioned <- unionSpatialPolygons(cp, invert(polys)) coords <- fortify(unioned) coords$order <- 1:nrow(coords) coords }
In your case
row.names(sf@data) ## [1] "california_586" "california_590" "california_616"
are the identifiers you wish to use as the region parameters, as
place
state
andname
do not uniquely identify the three polygons.# as.character used to coerce from factor lapply(lapply(sf@data[,c('place','state','name')], unique), as.character) ## $place ## [1] "67000" ## ## $state ## [1] "06" ## ## $name ## [1] "San Francisco"
As a character vector where the elements begin with alphabetic characters, when coerced to numeric, it becomes
NA
as.numeric(rownames(sf@data)) ## [1] NA NA NA ## Warning message: ## NAs introduced by coercion
Which is one of the warnings given
Solution
- Define a column to be the rownames
- Set the row.names to
NULL
or1:nrow(sf@data)
So..
# rownames sf@data[['place_id']] <- rownames(sf@data) row.names(sf@data) <- NULL # fortify sf_ggplot <- fortify(sf, region = 'place_id') # merge to add the original data sf_ggplot_all <- merge(sf_ggplot, sf@data, by.x = 'id', by.y = 'place_id') # very basic and uninteresting plot ggplot(sf_ggplot_all,aes(x=long,y=lat, group = group)) + geom_polygon(aes(fill =pop2000)) + coord_map()
这篇关于使用ggplot2基于人口普查数据绘制地图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!