根据 NLCD(土地覆盖数据)栅格计算的县域面积过大 [英] County area calculated from NLCD (Landcover data) rasters is too large

查看:27
本文介绍了根据 NLCD(土地覆盖数据)栅格计算的县域面积过大的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试计算美国每个县的土地覆盖重新划分.我已经使用 FedData 包(devtools 版本)获得了 Apache 县的 NLCD,我正在使用人口普查局的县 shapefile.

问题是我得到的区域比官方的和我的 shapefile 中指示的区域大得多,即 51,000km^2 而不是官方的 29,0000km^2.一定有一些我不了解光栅对象的地方,但经过数小时的网络搜索后,我感到非常困惑,感谢任何帮助.

下面介绍使用的代码和计算方法.县数据可以在这里下载:

解决方案

原因是你得到了墨卡托投影返回的数据.

crs(nlcd_data)CRS 参数:+proj=merc +a=6378137 +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext+no_defs

这个坐标参考系统保持形状,这就是它被用于网络映射的原因.它不应该用于大多数其他目的,因为它也因区域失真而臭名昭著.

实事求是的信息是永远不要只相信投影栅格的标称分辨率并假设它是正确的和/或恒定的.可靠计算面积的方法是使用经度/纬度坐标,因为根据定义,这些坐标不会失真.

报告的空间分辨率为

res(nlcd_data)[1] 30 30

因此,您预计单元格的面积为 30 x 30 = 900 m2 也就不足为奇了.但是阿帕奇县的像元大小实际上在 573 到 625 m2 之间.如下图.

先获取数据

库(FedData)县 <- raster::getData("GADM", country="USA", level=2)apache <-子集(县,县$NAME_2==Apache")nlcd <- get_nlcd(apache, year = 2011, label = "nlcd_apache", force.redo = TRUE)# 移动到 terra图书馆(土地)r <- rast(nlcd)ap <-向量(阿帕奇)# 县边界到墨卡托apm <- 项目(ap, crs(r))

为了计算网格单元的面积,我将它们表示为多边形.我首先聚合以获得更大的单元格以避免获得太多的小多边形(这将花费很长时间,并且可能会导致 R 崩溃).然后我将墨卡托多边形转换为经度/纬度,计算它们的真实面积,然后转换回来(只是为了保持一致的显示目的).

f <- 300a <- 聚合(rast(r),f)p <- as.polygons(a)# 计算区域g <- 项目(p,+proj=longlat")g$area <- round(expanse(g)/(f * f), 5)# 项目返回并绘制merc <- 项目(g, crs(r))绘图(merc,区域",边界 = NA)线路(apm)

地图显示了原始900 m2"大小的近似变化.单元格(在 573 和 625 之间).当我使用原始数据时,情况并非如此,如下图所示.

图书馆(土地)#文件名"是具有 nlcd 数据的本地文件usnlcd <- rast(文件名)crs(usnlcd, proj4=TRUE)#[1] +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"资源(x)#[1] 30 30

请注意,+proj=aea 代表 Albers 等面积 投影!

ap <- vect(apache)apm <- 项目(ap, crs(usnlcd))x <- 作物(usnlcd,apm)par(mfrow=c(1,2))情节(x)线路(apm)# gg2 计算如上绘图(gg2,区域",边界= NA)

如您所见,单元面积确实是 900 m2,只有很小的失真,可以忽略不计.

您可以将墨卡托数据转换回原始的+proj=aea,但是这样您的数据质量就会降低两次.所以这是一个非常糟糕的主意.您也可以考虑每个单元格的真实单元格面积,但这是相当复杂的.

最后得到每个土地覆盖类别所覆盖的面积

m <- 掩码(x, apm)f<-频率(m)f <- data.frame(f)f$area <- round(f$count * 900/1e6, 1)# 下一步有点棘手,将在未来版本中由`freq` 完成levs <- 级别(m)[[1]]f$class <- levs[f$value + 1]

瞧:

f[, c(class", area")]#班级#1 开放水域 21.5#2 已开发,开放空间 175.3#3 发达,低强度 46.4#4 已开发,中等强度 9.7#5 开发,高强度 1.0#6 贫瘠之地 232.5#7 落叶林 44.6#8 常青林 7604.6#9 混合森林 2.0#10 灌木/磨砂 18262.8#11 草本植物 2514.4#12 干草/牧场 1.9#13 栽培作物 0.0#14 伍迪湿地 38.8#15 新兴草本湿地 53.6

总数符合预期

sum(f$area)#[1] 29009.1

附注.这个问题现在已经在源头上得到解决 --- 但我希望这个答案对使用墨卡托 CRS 空间数据的其他人仍然有用.

I'm trying to calculate landcover repartition for each US county. I have obtained NLCD for the Apache county using the FedData package (devtools version) and I'm using county shapefiles from the Census bureau.

The problem is that I get an area that is much larger than the official one and the one indicated in my shapefile, namely 51,000km^2 instead of 29,0000km^2 officially. There must be something I don't understand about the raster object but I'm a very confused after hours of websearching, any help appreciated.

The following describes the code used and the method used to calculate. The county data can be downloaded here: https://www2.census.gov/geo/tiger/TIGER2016/COUNTY/

The following code assumes the county shapefile is saved and unzipped.

  1. Get and read the data

#devtools::install_github("ropensci/FedData")
library(FedData)
library(rgdal)
library(dplyr)

#Get Apache polygone
counties<- readOGR('tl_2016_us_county/tl_2016_us_county.shp')
apache <- subset(counties,counties$GEOID=="04001")

# Get NCLD data 
nlcd_data <- get_nlcd(apache,
                      year = 2011,
                      label = "Apache",
                      force.redo = TRUE)

nlcd_data #inspect the object, can see that number of cells is around 57 million

  1. I have then extracted the values of the raster and put them into a frequency table. From there I calculate the resulting area. Since the NLCD data is 30m resolution, I multiply the number of cell of each category by 900 and divide the result by 1 million to obtain the area in Square Kilometer.

The calculated total area is too large.

# Calculating the landcover repartition in County
landcover<-data.frame(x2011 = values(nlcd_data)) #Number of rows corresponds to number of cells 
landcover_freq<-table(landcover)

df_landcover <- as.data.frame(landcover_freq)

res <- df_landcover %>%
  mutate(area_type_sqm = Freq*900,
         area_type_km=area_type_sqm/1e6,
         area_sqkm = sum(area_type_km))%>%
  group_by(landcover)%>%
  mutate(pc_land =round(100*area_type_km/area_sqkm,1))

head(arrange(res,desc(pc_land)))

# A tibble: 6 x 6
# Groups:   landcover [6]
  landcover     Freq area_type_sqm area_type_km area_sqkm pc_land
  <fct>        <int>         <dbl>        <dbl>     <dbl>   <dbl>
1 52        33455938   30110344200      30110.     51107.    58.9
2 42        16073820   14466438000      14466.     51107.    28.3
3 71         5999412    5399470800       5399.     51107.    10.6
4 31          488652     439786800        440.     51107.     0.9
5 21          362722     326449800        326.     51107.     0.6
6 22           95545      85990500         86.0    51107.     0.2

## Total area calculated from raster is 51,107 square km

apache_area <- as.data.frame(apache) %>%
  mutate(AREA=(as.numeric(ALAND)+as.numeric(AWATER))/1e6) %>%
  select(AREA)

apache_area$AREA
29055.47 #Official area of apache county (in square km)

  1. Visual inspection of the shapefile and the raster:

The difference doesn't seem large enough to justify the difference

apache <- spTransform(apache,proj4string(nlcd_data))
plot(nlcd_data)
plot(apache,add=TRUE)

解决方案

The reason is that you get the data returned in the Mercator projection.

crs(nlcd_data)
CRS arguments:
 +proj=merc +a=6378137 +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext
+no_defs 

This coordinate reference system preserves shape and that is why it is used for web-mapping. It should not be used for most other purposes, as it is also notorious for distortion of area.

The take-home message is to never just trust the nominal resolution of a projected raster and assume that it is correct and/or constant. The reliable way to compute area is to use longitude/latitude coordinates, because these are, by definition, not distorted.

The reported spatial resolution is

res(nlcd_data)
[1] 30 30

So it is not surprising that you expected that the cells have an area of 30 x 30 = 900 m2. But the cells sizes are actually between 573 and 625 m2 for Apache county. Illustrated below.

First get the data

library(FedData)
counties <- raster::getData("GADM", country="USA", level=2)
apache <- subset(counties,counties$NAME_2=="Apache")
nlcd <- get_nlcd(apache, year = 2011, label = "nlcd_apache", force.redo = TRUE)

# move to terra
library(terra)
r <- rast(nlcd)
ap <- vect(apache)
# county boundaries to Mercator
apm <- project(ap, crs(r))

To compute the area of the grid cells I represent them as polygons. I first aggregate to get much larger cells to avoid getting too many small polygons (it would take very long, and perhaps crash R). I then transform the Mercator polygons to longitude/latitude, compute their true area, and transform back (just for consistent display purposes).

f <- 300
a <- aggregate(rast(r), f)
p <- as.polygons(a)

# compute area
g <- project(p, "+proj=longlat")
g$area <- round(expanse(g) / (f * f), 5)

# project back and plot
merc <- project(g, crs(r))
plot(merc, "area", border=NA)
lines(apm)

The map shows the approximate variation in the size of the original "900 m2" cells (between 573 and 625). This is not the case when I use the original data, as illustrated below.

library(terra)
# "filename" is the local file that has the nlcd data
usnlcd <- rast(filename)
crs(usnlcd, proj4=TRUE)
#[1] "+proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"

res(x)
#[1] 30 30

Note that +proj=aea stands for the Albers Equal Area projection!

ap <- vect(apache)
apm <- project(ap, crs(usnlcd))
x <- crop(usnlcd, apm)

par(mfrow=c(1,2))
plot(x)
lines(apm)
# gg2 computed as above
plot(gg2, "area", border=NA)

As you can see, the cell area is indeed 900 m2, with only very little distortion, so small that it can be ignored.

You could transform the Mercator data back to the original +proj=aea, but then you would have degraded the quality of the data twice. So that is a really bad idea. You could also account for the true cell area of each cell, but that is rather convoluted.

Finally, to get the area covered by each land cover class

m <- mask(x, apm)
f <- freq(m)
f <- data.frame(f)
f$area <- round(f$count * 900 / 1e6, 1)

# the next step is a bit tricky and will be done by `freq` in future versions
levs <- levels(m)[[1]]
f$class <- levs[f$value + 1]

Voila:

f[, c("class", "area")]
#                           class    area
#1                     Open Water    21.5
#2          Developed, Open Space   175.3
#3       Developed, Low Intensity    46.4
#4    Developed, Medium Intensity     9.7
#5      Developed, High Intensity     1.0
#6                    Barren Land   232.5
#7               Deciduous Forest    44.6
#8               Evergreen Forest  7604.6
#9                   Mixed Forest     2.0
#10                   Shrub/Scrub 18262.8
#11                   Herbaceuous  2514.4
#12                   Hay/Pasture     1.9
#13              Cultivated Crops     0.0
#14                Woody Wetlands    38.8
#15 Emergent Herbaceuous Wetlands    53.6

And the total is as expected

sum(f$area)
#[1] 29009.1
 

PS. This problem has now been solved at the source --- but I hope this answer will remain useful for others using spatial data with a Mercator CRS.

这篇关于根据 NLCD(土地覆盖数据)栅格计算的县域面积过大的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆