如何将来自美国人口普查局的国家级形状文件结合到全国范围内 [英] How to combine state-level shapefiles from the united states census bureau into a nationwide shape

查看:235
本文介绍了如何将来自美国人口普查局的国家级形状文件结合到全国范围内的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

人口普查局不提供全国范围的公共使用微数据区(美国社区调查中最小的地理位置)。我尝试将它们全部与几种不同的方法结合在一起,但即使是一旦遇到加利福尼亚州的破坏标识符就会被破坏。我做一些愚蠢的事情,还是需要一个困难的解决方法?这里的代码可以重现,直到事情发生了。

  library(taRifx.geo)
library(maptools)

td< - tempdir(); tf< - tempfile()
setInternet2(TRUE)
download.file(ftp://ftp2.census.gov/geo/tiger/TIGER2014/PUMA/,tf)

al< - readLines(tf)
tl< - al [grep(geo / tiger / TIGER2014 / PUMA / tl_2014_,al)]
fp < - gsub (。*)geo / tiger / TIGER2014 / PUMA / tl_2014 _([0-9] *)_ puma10\\.zip(。*),\\2,t1)

#摆脱阿拉斯加
fp< - fp [fp!='02']

af< - paste0(ftp://ftp2.census.gov/geo / tiger / TIGER2014 / PUMA / tl_2014_,fp,_puma10.zip)

d< - NULL
(i in af){
try(file.remove (z),silent = TRUE)
download.file(i,tf,mode ='wb')
z< - unzip(tf,exdir = td)
b< - readShapePoly z [grep('shp $',z)])
if(is.null(d))d< - b else d< - taRifx.geo ::: rbind.SpatialPolygonsDataFrame(d,b, fix.duplicated.IDs = TRUE)
}

#在`row.names< - 。data.frame`中的错误(`* tmp *`,value = c(d.0 ,d.1,d.2 ,
#duplicate'row.names'不允许
#另外:警告消息:
#设置'row.names'时的非唯一值:'d.0' ,'d.1','d.10','d.11','d.12','d.13','d.14','d.15','d.16' d.17,d.18,d.19,d.2,d.3,d.4,d.5,d.6 7','d.8','d.9'


解决方案

这是另一种方法,其中包括获取FTP目录列表的快捷方式。正如@Pop所提到的,关键是要确保这些ID都是唯一的。

 库(RCurl)
库(rgdal)

#获取目录列表
u< - 'ftp://ftp2.census.gov/geo/tiger/TIGER2014/PUMA/'
f< - paste0(u,strsplit(getURL(u,ftp.use.epsv = FALSE,ftplistonly = TRUE),
'\\s +')[[1]])

#下载并提取到tempdir / shps
invisible(sap(f,function(x){
path< - file.path(tempdir(),basename(x))
download.file (x,destfile = path,mode ='wb')
unzip(path,exdir = file.path(tempdir(),'shps'))
}))

#读取所有的shps,并将shapefile名称添加到ID
shps< - lapply(sub('\\.zip',',basename(f)),function(x){
shp< - readOGR(file.path(tempdir(),'shps'),x)
shp< - spChFIDs(shp,paste0(x,'_',sapply(slot(shp,多边形),插槽,ID)))
shp
})

#rbind到一个罪gle对象
shp< - do.call(rbind,as.list(shps))

#plot(注:剪切到连续状态用于显示目的)
plot shp,axis = T,xlim = c(-130,-60),ylim = c(20,50),las = 1)

#写出wd / USA.shp
writeOGR(shp,'。','USA','ESRI Shapefile')


The census bureau doesn't provide a nationwide shapefile of public use microdata areas (the smallest geography available on the American Community Survey). I tried combining them all with a few different methods, but even the one that de-dupes identifiers breaks once it hits California. Am I doing something silly or does this require a difficult workaround? Here's code to reproduce up to the point where things break.

library(taRifx.geo)
library(maptools)

td <- tempdir() ; tf <- tempfile()
setInternet2( TRUE )
download.file( "ftp://ftp2.census.gov/geo/tiger/TIGER2014/PUMA/" , tf )

al <- readLines( tf )
tl <- al[ grep( "geo/tiger/TIGER2014/PUMA/tl_2014_" , al ) ]
fp <- gsub( "(.*)geo/tiger/TIGER2014/PUMA/tl_2014_([0-9]*)_puma10\\.zip(.*)" , "\\2" , tl )

# get rid of alaska
fp <- fp[ fp != '02' ]

af <- paste0( "ftp://ftp2.census.gov/geo/tiger/TIGER2014/PUMA/tl_2014_" , fp , "_puma10.zip" )

d <- NULL
for ( i in af ){
    try( file.remove( z ) , silent = TRUE )
    download.file( i , tf , mode = 'wb' )
    z <- unzip( tf , exdir = td )
    b <- readShapePoly( z[ grep( 'shp$' , z ) ] )
    if ( is.null( d ) ) d <- b else d <- taRifx.geo:::rbind.SpatialPolygonsDataFrame( d , b , fix.duplicated.IDs = TRUE )
}

# Error in `row.names<-.data.frame`(`*tmp*`, value = c("d.0", "d.1", "d.2",  : 
  # duplicate 'row.names' are not allowed
# In addition: Warning message:
# non-unique values when setting 'row.names': ‘d.0’, ‘d.1’, ‘d.10’, ‘d.11’, ‘d.12’, ‘d.13’, ‘d.14’, ‘d.15’, ‘d.16’, ‘d.17’, ‘d.18’, ‘d.19’, ‘d.2’, ‘d.3’, ‘d.4’, ‘d.5’, ‘d.6’, ‘d.7’, ‘d.8’, ‘d.9’ 

解决方案

Here's another approach, which includes a short cut for obtaining the FTP directory listing. As @Pop mentioned, the key is to ensure that the IDs are all unique.

library(RCurl) 
library(rgdal)

# get the directory listing
u <- 'ftp://ftp2.census.gov/geo/tiger/TIGER2014/PUMA/'
f <- paste0(u, strsplit(getURL(u, ftp.use.epsv = FALSE, ftplistonly = TRUE), 
                        '\\s+')[[1]])

# download and extract to tempdir/shps
invisible(sapply(f, function(x) {
  path <- file.path(tempdir(), basename(x))
  download.file(x, destfile=path, mode = 'wb')
  unzip(path, exdir=file.path(tempdir(), 'shps'))
}))

# read in all shps, and prepend shapefile name to IDs
shps <- lapply(sub('\\.zip', '', basename(f)), function(x) {
  shp <- readOGR(file.path(tempdir(), 'shps'), x)
  shp <- spChFIDs(shp, paste0(x, '_', sapply(slot(shp, "polygons"), slot, "ID")))
  shp
})

# rbind to a single object
shp <- do.call(rbind, as.list(shps))

# plot (note: clipping to contiguous states for display purposes)
plot(shp, axes=T, xlim=c(-130, -60), ylim=c(20, 50), las=1)

# write out to wd/USA.shp
writeOGR(shp, '.', 'USA', 'ESRI Shapefile')

这篇关于如何将来自美国人口普查局的国家级形状文件结合到全国范围内的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆