空间聚集与分组 [英] Spatial aggregation with a group by

查看:80
本文介绍了空间聚集与分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试基于空间
聚合计算分组平均值。

I am trying to calculate grouped-by averages based on a spatial aggregation.

我有两个shapefile:人口普查区域和病房。这些病房的价值为
,我想根据每个人口普查区域将其平均化。

I have two shapefiles: census tracts and wards. The wards have a value that I would like to average by a factor for each census tract.

以下是shapfile:

Here are the shapfiles:

library(dplyr)
library(rgeos)
library(rgdal)
# Census tracts
download.file("http://www12.statcan.gc.ca/census-recensement/2011/geo/bound-limit/files-fichiers/gct_000b11a_e.zip", 
    destfile = "gct_000a11a_e.zip")
unzip("gct_000a11a_e.zip", exdir="tracts") # corrected typo
census_tracts <- readOGR(dsn = "tracts", layer = "gct_000b11a_e") %>%
  spTransform(CRS('+init=epsg:4326'))

# Wards
download.file("http://opendata.toronto.ca/gcc/voting_subdivision_2010_wgs84.zip",
                destfile = "subdivisions_2010.zip")
unzip("subdivisions_2010.zip", exdir="wards")
wards <- readOGR(dsn = "wards", layer = "VOTING_SUBDIVISION_2010_WGS84") %>%
  spTransform(proj4string(census_tracts))

然后我对普查进行了子集化TR只对那些在病房里的人起作用:

Then I subset the census tracts to just those in the wards:

census_tracts_in_wards <- census_tracts[wards, ]

每个病房的数据都有两个等级的因素:

I have data for each ward with a two-level factor:

df <- expand.grid(AREA_ID = wards$AREA_ID, factor = as.factor(letters[1:2]))
df$value <- rnorm(n = nrow(df))
wards@data <- left_join(wards@data, df)

现在(最后问我一个问题),我想计算每个人口普查区中的平均
价值,作为每个
人口普查区中病房的总和。我认为这是我计算每个普查
区域的平均值的方式:

Now (finally getting to my question) I would like to calculate the mean value in each census tract, as an aggregation of the wards within each census tract. I think this is how I calculate the mean for each census tract:

ag <- aggregate(x = wards["value"], by = census_tracts_in_wards, FUN = mean)

有没有办法这是由因素决定的吗?我希望 ag 空间
数据框包括 factor 列和均值<$的列每个普查区的
的c $ c>值。本质上等效于:

Is there a way to do this by factor? I'd like the ag spatial dataframe to include a factor column and a column for mean value of each census tract. Essentially the equivalent of:

result <- df %>% 
  group_by(AREA_ID, factor) %>% 
  summarize(value = mean(value))

但是,按<$分组c $ c> CTUID 来自 census_tracts_in_wards ,而不是<$中的
AREA_ID c $ c> wards 。

But, grouped by CTUID from census_tracts_in_wards instead of AREA_ID in wards.

正如Pierre Lafortune所建议的那样,公式语法在这里看起来很自然。但是,这些工作都不起作用:

As suggested by Pierre Lafortune, the formula syntax seems natural here. But, none of these work:

ag2 <- aggregate(x = wards["value"] ~ wards["factor"], 
  by = census_tracts_in_wards, FUN = mean)
ag3 <- aggregate(x = wards["value" ~ "factor"], 
  by = census_tracts_in_wards, FUN = mean)
ag4 <- aggregate(x = wards["value ~ factor"], 
  by = census_tracts_in_wards, FUN = mean)

也许该分组属于FUN调用?

Perhaps the grouping belongs in the FUN call?

推荐答案

由Edzer Pebesma提出,如果仔细阅读 sp :: aggregate 文档,则表明FUN应用于x的每个属性。因此,与其创建带有因子列的长表,不如创建两个单独的列(每个因子一个)。

Prompted by Edzer Pebesma, a closer read of the sp::aggregate documentation indicates that FUN is applied to each attribute of x. So, instead of creating a long table with a factor column, creating two separate columns (one for each factor) seems to work.

wards2 <- readOGR(dsn = "wards", layer = "VOTING_SUBDIVISION_2010_WGS84") %>%
  spTransform(proj4string(census_tracts))
wards2@data <- dplyr::select(wards2@data, AREA_ID) # Drop the other attributes
df2 <- tidyr::spread(df, factor, value)
wards2@data <- left_join(wards2@data, df2)
ag5 <- aggregate(x = wards2, by = census_tracts_in_wards, FUN = mean)
ag5@data <- dplyr::select(ag5@data, -(AREA_ID)) # The mean of AREA_ID is meaningless 
summary(ag5)
## Object of class SpatialPolygonsDataFrame
## Coordinates:
##         min       max
## x -79.73389 -79.08603
## y  43.56243  43.89091
## Is projected: FALSE 
## proj4string :
## [+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84
## +towgs84=0,0,0]
## Data attributes:
##        a                  b            
##  Min.   :-1.28815   Min.   :-1.835409  
##  1st Qu.:-0.24883   1st Qu.:-0.289510  
##  Median : 0.01048   Median : 0.008777  
##  Mean   : 0.02666   Mean   :-0.011018  
##  3rd Qu.: 0.25450   3rd Qu.: 0.265358  
##  Max.   : 1.92769   Max.   : 1.399876

这篇关于空间聚集与分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆