根据 NLCD(土地覆盖数据)栅格计算的县域面积过大 [英] County area calculated from NLCD (Landcover data) rasters is too large
问题描述
我正在尝试计算美国每个县的土地覆盖重新划分.我已经使用 FedData 包(devtools 版本)获得了 Apache 县的 NLCD,我正在使用人口普查局的县 shapefile.
问题是我得到的区域比官方的和我的 shapefile 中指示的区域大得多,即 51,000km^2 而不是官方的 29,0000km^2.一定有一些我不了解光栅对象的地方,但经过数小时的网络搜索后,我感到非常困惑,感谢任何帮助.
下面介绍使用的代码和计算方法.县数据可以在这里下载:
原因是你得到了墨卡托投影返回的数据.
crs(nlcd_data)CRS 参数:+proj=merc +a=6378137 +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext+no_defs
这个坐标参考系统保持形状,这就是它被用于网络映射的原因.它不应该用于大多数其他目的,因为它也因区域失真而臭名昭著.
实事求是的信息是永远不要只相信投影栅格的标称分辨率并假设它是正确的和/或恒定的.可靠计算面积的方法是使用经度/纬度坐标,因为根据定义,这些坐标不会失真.
报告的空间分辨率为
res(nlcd_data)[1] 30 30
因此,您预计单元格的面积为 30 x 30 = 900 m2 也就不足为奇了.但是阿帕奇县的像元大小实际上在 573 到 625 m2 之间.如下图.
先获取数据
库(FedData)县 <- raster::getData("GADM", country="USA", level=2)apache <-子集(县,县$NAME_2==Apache")nlcd <- get_nlcd(apache, year = 2011, label = "nlcd_apache", force.redo = TRUE)# 移动到 terra图书馆(土地)r <- rast(nlcd)ap <-向量(阿帕奇)# 县边界到墨卡托apm <- 项目(ap, crs(r))
为了计算网格单元的面积,我将它们表示为多边形.我首先聚合以获得更大的单元格以避免获得太多的小多边形(这将花费很长时间,并且可能会导致 R 崩溃).然后我将墨卡托多边形转换为经度/纬度,计算它们的真实面积,然后转换回来(只是为了保持一致的显示目的).
f <- 300a <- 聚合(rast(r),f)p <- as.polygons(a)# 计算区域g <- 项目(p,+proj=longlat")g$area <- round(expanse(g)/(f * f), 5)# 项目返回并绘制merc <- 项目(g, crs(r))绘图(merc,区域",边界 = NA)线路(apm)
地图显示了原始900 m2"大小的近似变化.单元格(在 573 和 625 之间).当我使用原始数据时,情况并非如此,如下图所示.
图书馆(土地)#文件名"是具有 nlcd 数据的本地文件usnlcd <- rast(文件名)crs(usnlcd, proj4=TRUE)#[1] +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"资源(x)#[1] 30 30
请注意,+proj=aea
代表 Albers 等面积 投影!
ap <- vect(apache)apm <- 项目(ap, crs(usnlcd))x <- 作物(usnlcd,apm)par(mfrow=c(1,2))情节(x)线路(apm)# gg2 计算如上绘图(gg2,区域",边界= NA)
如您所见,单元面积确实是 900 m2,只有很小的失真,可以忽略不计.
您可以将墨卡托数据转换回原始的+proj=aea
,但是这样您的数据质量就会降低两次.所以这是一个非常糟糕的主意.您也可以考虑每个单元格的真实单元格面积,但这是相当复杂的.
最后得到每个土地覆盖类别所覆盖的面积
m <- 掩码(x, apm)f<-频率(m)f <- data.frame(f)f$area <- round(f$count * 900/1e6, 1)# 下一步有点棘手,将在未来版本中由`freq` 完成levs <- 级别(m)[[1]]f$class <- levs[f$value + 1]
瞧:
f[, c(class", area")]#班级#1 开放水域 21.5#2 已开发,开放空间 175.3#3 发达,低强度 46.4#4 已开发,中等强度 9.7#5 开发,高强度 1.0#6 贫瘠之地 232.5#7 落叶林 44.6#8 常青林 7604.6#9 混合森林 2.0#10 灌木/磨砂 18262.8#11 草本植物 2514.4#12 干草/牧场 1.9#13 栽培作物 0.0#14 伍迪湿地 38.8#15 新兴草本湿地 53.6
总数符合预期
sum(f$area)#[1] 29009.1
附注.这个问题现在已经在源头上得到解决 --- 但我希望这个答案对使用墨卡托 CRS 空间数据的其他人仍然有用.
I'm trying to calculate landcover repartition for each US county. I have obtained NLCD for the Apache county using the FedData package (devtools version) and I'm using county shapefiles from the Census bureau.
The problem is that I get an area that is much larger than the official one and the one indicated in my shapefile, namely 51,000km^2 instead of 29,0000km^2 officially. There must be something I don't understand about the raster object but I'm a very confused after hours of websearching, any help appreciated.
The following describes the code used and the method used to calculate. The county data can be downloaded here: https://www2.census.gov/geo/tiger/TIGER2016/COUNTY/
The following code assumes the county shapefile is saved and unzipped.
- Get and read the data
#devtools::install_github("ropensci/FedData")
library(FedData)
library(rgdal)
library(dplyr)
#Get Apache polygone
counties<- readOGR('tl_2016_us_county/tl_2016_us_county.shp')
apache <- subset(counties,counties$GEOID=="04001")
# Get NCLD data
nlcd_data <- get_nlcd(apache,
year = 2011,
label = "Apache",
force.redo = TRUE)
nlcd_data #inspect the object, can see that number of cells is around 57 million
- I have then extracted the values of the raster and put them into a frequency table. From there I calculate the resulting area. Since the NLCD data is 30m resolution, I multiply the number of cell of each category by 900 and divide the result by 1 million to obtain the area in Square Kilometer.
The calculated total area is too large.
# Calculating the landcover repartition in County
landcover<-data.frame(x2011 = values(nlcd_data)) #Number of rows corresponds to number of cells
landcover_freq<-table(landcover)
df_landcover <- as.data.frame(landcover_freq)
res <- df_landcover %>%
mutate(area_type_sqm = Freq*900,
area_type_km=area_type_sqm/1e6,
area_sqkm = sum(area_type_km))%>%
group_by(landcover)%>%
mutate(pc_land =round(100*area_type_km/area_sqkm,1))
head(arrange(res,desc(pc_land)))
# A tibble: 6 x 6
# Groups: landcover [6]
landcover Freq area_type_sqm area_type_km area_sqkm pc_land
<fct> <int> <dbl> <dbl> <dbl> <dbl>
1 52 33455938 30110344200 30110. 51107. 58.9
2 42 16073820 14466438000 14466. 51107. 28.3
3 71 5999412 5399470800 5399. 51107. 10.6
4 31 488652 439786800 440. 51107. 0.9
5 21 362722 326449800 326. 51107. 0.6
6 22 95545 85990500 86.0 51107. 0.2
## Total area calculated from raster is 51,107 square km
apache_area <- as.data.frame(apache) %>%
mutate(AREA=(as.numeric(ALAND)+as.numeric(AWATER))/1e6) %>%
select(AREA)
apache_area$AREA
29055.47 #Official area of apache county (in square km)
- Visual inspection of the shapefile and the raster:
The difference doesn't seem large enough to justify the difference
apache <- spTransform(apache,proj4string(nlcd_data))
plot(nlcd_data)
plot(apache,add=TRUE)
The reason is that you get the data returned in the Mercator projection.
crs(nlcd_data)
CRS arguments:
+proj=merc +a=6378137 +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext
+no_defs
This coordinate reference system preserves shape and that is why it is used for web-mapping. It should not be used for most other purposes, as it is also notorious for distortion of area.
The take-home message is to never just trust the nominal resolution of a projected raster and assume that it is correct and/or constant. The reliable way to compute area is to use longitude/latitude coordinates, because these are, by definition, not distorted.
The reported spatial resolution is
res(nlcd_data)
[1] 30 30
So it is not surprising that you expected that the cells have an area of 30 x 30 = 900 m2. But the cells sizes are actually between 573 and 625 m2 for Apache county. Illustrated below.
First get the data
library(FedData)
counties <- raster::getData("GADM", country="USA", level=2)
apache <- subset(counties,counties$NAME_2=="Apache")
nlcd <- get_nlcd(apache, year = 2011, label = "nlcd_apache", force.redo = TRUE)
# move to terra
library(terra)
r <- rast(nlcd)
ap <- vect(apache)
# county boundaries to Mercator
apm <- project(ap, crs(r))
To compute the area of the grid cells I represent them as polygons. I first aggregate to get much larger cells to avoid getting too many small polygons (it would take very long, and perhaps crash R). I then transform the Mercator polygons to longitude/latitude, compute their true area, and transform back (just for consistent display purposes).
f <- 300
a <- aggregate(rast(r), f)
p <- as.polygons(a)
# compute area
g <- project(p, "+proj=longlat")
g$area <- round(expanse(g) / (f * f), 5)
# project back and plot
merc <- project(g, crs(r))
plot(merc, "area", border=NA)
lines(apm)
The map shows the approximate variation in the size of the original "900 m2" cells (between 573 and 625). This is not the case when I use the original data, as illustrated below.
library(terra)
# "filename" is the local file that has the nlcd data
usnlcd <- rast(filename)
crs(usnlcd, proj4=TRUE)
#[1] "+proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"
res(x)
#[1] 30 30
Note that +proj=aea
stands for the Albers Equal Area projection!
ap <- vect(apache)
apm <- project(ap, crs(usnlcd))
x <- crop(usnlcd, apm)
par(mfrow=c(1,2))
plot(x)
lines(apm)
# gg2 computed as above
plot(gg2, "area", border=NA)
As you can see, the cell area is indeed 900 m2, with only very little distortion, so small that it can be ignored.
You could transform the Mercator data back to the original +proj=aea
, but then you would have degraded the quality of the data twice. So that is a really bad idea. You could also account for the true cell area of each cell, but that is rather convoluted.
Finally, to get the area covered by each land cover class
m <- mask(x, apm)
f <- freq(m)
f <- data.frame(f)
f$area <- round(f$count * 900 / 1e6, 1)
# the next step is a bit tricky and will be done by `freq` in future versions
levs <- levels(m)[[1]]
f$class <- levs[f$value + 1]
Voila:
f[, c("class", "area")]
# class area
#1 Open Water 21.5
#2 Developed, Open Space 175.3
#3 Developed, Low Intensity 46.4
#4 Developed, Medium Intensity 9.7
#5 Developed, High Intensity 1.0
#6 Barren Land 232.5
#7 Deciduous Forest 44.6
#8 Evergreen Forest 7604.6
#9 Mixed Forest 2.0
#10 Shrub/Scrub 18262.8
#11 Herbaceuous 2514.4
#12 Hay/Pasture 1.9
#13 Cultivated Crops 0.0
#14 Woody Wetlands 38.8
#15 Emergent Herbaceuous Wetlands 53.6
And the total is as expected
sum(f$area)
#[1] 29009.1
PS. This problem has now been solved at the source --- but I hope this answer will remain useful for others using spatial data with a Mercator CRS.
这篇关于根据 NLCD(土地覆盖数据)栅格计算的县域面积过大的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!