R中的Choropleth贴图-TIGER Shapefile问题 [英] Choropleth Maps in R - TIGER Shapefile issue

查看:91
本文介绍了R中的Choropleth贴图-TIGER Shapefile问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对使用R进行映射有疑问,特别是在R中的拟色映射附近.

Have a Question on Mapping with R, specifically around the choropleth maps in R.

我有一个分配给are的邮政编码和一些相关数据的数据集(数据集为此处).

I have a dataset of ZIP codes assigned to an are and some associated data (dataset is here).

我的最终数据格式是:区域ID,ZIP,概率值,客户数,区域概率和区域客户总数.我正在尝试通过在地图上绘制区域概率和区域客户总数来显示此数据.我曾经尝试过使用人口普查TIGER Shapefile来做到这一点,但我猜R不能处理整个国家.

My final data format is: Area ID, ZIP, Probability Value, Customer Count, Area Probability and Area Customer Total. I am attempting to present this data by plotting area probability and Area Customer Total on a Map. I have tried to do this by using the census TIGER Shapefiles but I guess R cannot handle the complete country.

我对统计功能感到满意,现在我将所有Mapping从第三方GIS集中的应用程序转移到R中进行我的所有Mapping.是否有人对如何在R中实现此目标有任何指示?

I am comfortable with the Statistical capabilities and now I am moving all my Mapping from third party GIS focused applications to doing all my Mapping in R. Does anyone have any pointers to how to achieve this from within R?

详细一点,这是R停止工作的地方-

To be a little more detailed, here's the point where R stops working -

shapes <- readShapeSpatial("tl_2013_us_zcta510.shp")

(其中shp文件是人口普查/老虎)形状文件.

(where the shp file is the census/TIGER) shape file.

编辑-提供更多详细信息.我试图首先阅读TIGER shapefile,希望将此空间数据集与我的数据结合起来并最终进行绘图.尝试读取形状文件时,一开始我就遇到了问题.下面是带有输出的代码

Edit - Providing further details. I am trying to first read the TIGER shapefiles, hoping to combine this spatial dataset with my data and eventually plot. I am having an issue at the very beginning when attempting to read the shape file. Below is the code with the output

require(maptools)
shapes<-readShapeSpatial("tl_2013_us_zcta510.shp")

Error: cannot allocate vector of size 317 Kb

推荐答案

有一些使用R制作地图的示例和教程,但是大多数都是非常通用的,不幸的是,大多数地图项目的细微差别会产生难以理解的问题.您的例子就是一个例子.

There are several examples and tutorials on making maps using R, but most are very general and, unfortunately, most map projects have nuances that create inscrutable problems. Yours is a case in point.

我遇到的最大问题是整个美国的美国人口普查局邮政编码制表区shapefile很大:〜800MB.使用readOGR(...)加载时,R SpatialPolygonDataFrame对象约为913MB.至少在我的系统上,尝试处理这种大小的文件(例如,使用fortify(...)转换为数据帧)会导致类似您在上面识别的错误.因此,解决方案是根据数据中实际存在的邮政编码对文件进行子集化.

The biggest issue I came across was that the US Census Bureau zip code tabulation area shapefile for the whole US is huge: ~800MB. When loaded using readOGR(...) the R SpatialPolygonDataFrame object is about 913MB. Trying to process a file this size, (e.g., converting to a data frame using fortify(...)), at least on my system, resulted in errors like the one you identified above. So the solution is to subset the file based in the zip codes that are actually in your data.

此地图:

是使用以下代码从您的数据中提取出来的.

was made from your data using the following code.

library(rgdal)
library(ggplot2)
library(stringr)
library(RColorBrewer)

setwd("<directory containing shapfiles and sample data>")

data     <- read.csv("Sample.csv",header=T) # your sample data, downloaded as csv
data$ZIP <- str_pad(data$ZIP,5,"left","0") # convert ZIP to char(5) w/leading zeros

zips     <- readOGR(dsn=".","tl_2013_us_zcta510") # import zip code polygon shapefile
map      <- zips[zips$ZCTA5CE10 %in% data$ZIP,]   # extract only zips in your Sample.csv
map.df   <- fortify(map)        # convert to data frame suitable for plotting
# merge data from Samples.csv into map data frame
map.data <- data.frame(id=rownames(map@data),ZIP=map@data$ZCTA5CE10)
map.data <- merge(map.data,data,by="ZIP")
map.df   <- merge(map.df,map.data,by="id")
# load state boundaries
states <- readOGR(dsn=".","gz_2010_us_040_00_5m")
states <- states[states$NAME %in% c("New York","New Jersey"),] # extract NY and NJ
states.df <- fortify(states)    # convert to data frame suitable for plotting

ggMap <- ggplot(data = map.df, aes(long, lat, group = group)) 
ggMap <- ggMap + geom_polygon(aes(fill = Probability_1))
ggMap <- ggMap + geom_path(data=states.df, aes(x=long,y=lat,group=group))
ggMap <- ggMap + scale_fill_gradientn(name="Probability",colours=brewer.pal(9,"Reds"))
ggMap <- ggMap + coord_equal()
ggMap

说明:

rgdal包有助于从ESRI shapefile创建R Spatial对象.在您的情况下,我们将多边形shapefile导入到R中的SpatialPolygonDataFrame对象中.后者有两个主要部分:一个多边形部分,其中包含将要在地图上创建多边形的纬度和经度点,以及一个数据部分其中包含有关多边形的信息(因此,每个多边形一行).例如,如果我们将空间对象称为map,则可以将这两个部分引用为map@polygonsmap@data.制作Choropleth贴图的基本挑战是将Sample.csv文件中的数据与相关的多边形(邮政编码)相关联.

The rgdal package facilitates the creation of R Spatial objects from ESRI shapefiles. In your case we are importing a polygon shapefile into a SpatialPolygonDataFrame object in R. The latter has two main parts: a polygon section, which contains the latitude and longitude points that will be joined to create the polygons on the map, and a data section which contains information about the polygons (so, one row for each polygon). If, e.g., we call the Spatial object map, then the two sections can be referenced as map@polygons and map@data. The basic challenge in making choropleth maps is to associate data from your Sample.csv file, with the relevant polygons (zip codes).

因此基本工作流程如下:

So the basic workflow is as follows:

1. Load polygon shapefiles into Spatial object ( => zips)
2. Subset if appropriate ( => map).
3. Convert to data frame suitable for plotting ( => map.df).
4. Merge data from Sample.csv into map.df.
5. Draw the map.

步骤4是导致所有问题的步骤.首先,我们必须将邮政编码与每个多边形相关联.然后,我们必须将Probability_1与每个邮政编码关联.这是一个三步过程.

Step 4 is the one that causes all the problems. First we have to associate zip codes with each polygon. Then we have to associate Probability_1 with each zip code. This is a three step process.

空间数据文件中的每个多边形都有唯一的ID,但是这些ID不是邮政编码.多边形ID作为行名存储在map@data中.邮政编码存储在map@dataZCTA5CE10列中.因此,首先我们必须创建一个将map@data行名(id)与map@data$ZCTA5CE10(ZIP)关联的数据框.然后,我们使用两个数据帧中的ZIP字段将您的Sample.csv与结果合并.然后,将其结果合并到map.df中.这可以用3行代码完成.

Each polygon in the Spatial data file has a unique ID, but these ID's are not the zip codes. The polygon ID's are stored as row names in map@data. The zip codes are stored in map@data, in column ZCTA5CE10. So first we must create a data frame that associates the map@data row names (id) with map@data$ZCTA5CE10 (ZIP). Then we merge your Sample.csv with the result using the ZIP field in both data frames. Then we merge the result of that into map.df. This can be done in 3 lines of code.

绘制地图涉及到告诉ggplot使用哪个数据集(map.df),用于x和y的列(长和纬度)以及如何按多边形对数据进行分组(group = group). map.df中的longlatgroup列都是通过调用fortify(...)创建的.对geom_polygon(...)的调用告诉ggplot绘制多边形并使用map.df$Probability_1中的信息进行填充.对geom_path(...)的调用告诉ggplot创建具有状态边界的层.对scale_fill_gradientn(...)的调用告诉ggplot使用基于颜色调制器"Reds"调色板的颜色方案.最后,对coord_equal(...)的调用告诉ggplot对x和y使用相同的比例,这样地图就不会失真.

Drawing the map involves telling ggplot what dataset to use (map.df), which columns to use for x and y (long and lat) and how to group the data by polygon (group=group). The columns long, lat, and group in map.df are all created by the call to fortify(...). The call to geom_polygon(...) tells ggplot to draw polygons and fill using the information in map.df$Probability_1. The call to geom_path(...) tells ggplot to create a layer with state boundaries. The call to scale_fill_gradientn(...) tells ggplot to use a color scheme based on the color brewer "Reds" palette. Finally, the call to coord_equal(...) tells ggplot to use the same scale for x and y so the map is not distorted.

NB:状态边界层使用美国各州TIGER文件.

NB: The state boundary layer, uses the US States TIGER file.

这篇关于R中的Choropleth贴图-TIGER Shapefile问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆