合并 Shapefile 和数据框 [英] Merging a Shapefile and a dataframe

查看:71
本文介绍了合并 Shapefile 和数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 R 中使用常规数据框 (df) 和 shapefile (map2),共享一个名为 <代码>CD116FP.df 有 103552 行,而 map2 有 444 行.我以下列方式加载 shapefile:

<块引用>

map2 <- read_sf("D:/Data/tl_2019_us_cd116.shp")

我的最终目标是使用函数 mapview() 以强度"查看包含在 map2 中的地图.这在 dfnp_scores 下描述.因此,我不希望观察 df 没有出现在 map2 上.

以下是我的想法和失败:

  1. 如果这两个对象是常规数据帧,合理的候选对象是使用 merge() 来组合两个对象,但是如果在这种情况下应用该函数,则生成的对象会丢失空间属性和mapview 不知道如何读取它.

  2. 我使用的另一种方法是尝试这行代码:

<块引用>

map2m<-data.frame(map2, df[match(map2$CD116FP, df$CD116FP),])

但结果的尺寸太大(比 444 行大得多),因此 mapview 在尝试绘制所需地图时崩溃.

  1. 最后,我全力以赴,构建了一个循环,将 np 列添加到 map2:

map2$np=10for (i in c(1:nrow(map2))){for (j in c(1:nrow(df))){如果(相同(map2$CD116FP[i],df$CD116FP[j])){map2$np[i]=df$np_score[j]}否则 {map2$np[i]=0}}}

然而,考虑到我的数据框的维度,这种方法需要太多时间.

您有什么建议吗?

解决方案

我对你的数据结构有点困惑.您的 df 有超过 100,000 行,所以我猜相同的 CD116FPdfnpscore 可能会因这些实例而异.如果您想将这些合并到 map2,您需要先聚合它们.

让我们尝试重新创建一个类似的设置:

库(sf)#>链接到 GEOS 3.8.0、GDAL 3.0.4、PROJ 6.3.1map2 <- read_sf("C:/users/administrator/documents/shape/tl_2019_us_cd116.shp")set.seed(69)df <- data.frame(CD116FP = sprintf("%02d", sample(0:99, 103552, TRUE)),npscores = runif(103552))头(df)#>CD116FP npscores#>1 95 0.6927742#>2 80 0.8543845#>3 90 0.5220353#>4 01 0.1449647#>5 76 0.9876543#>6 38 0.5629950

我已经使 df 具有与您的数据必须显示的行数相同的行数,以表明此解决方案将适用于您的问题.

让我们用 dplyr 聚合 npscores:

库(dplyr)df_sum<-df%>%过滤器(CD116FP %in% map2$CD116FP)%>%group_by(CD116FP)%>%总结(npscores = mean(npscores))map2$npscores <- df_sum$npscores[匹配(map2$CD116FP, df_sum$CD116FP)]

现在 map2 具有我们可以绘制的聚合 npscores - 例如,在 ggplot 中:

库(ggplot2)ggplot(地图2)+geom_sf(aes(fill = npscores)) +coord_sf(xlim = c(-180, -60),ylim = c(15, 70)) +scale_fill_gradient(低=红色",高=金色")

或在地图视图中:

library(mapview)地图视图(地图2,zcol =npscores")

reprex 包 (v0.3.0) 于 2020 年 9 月 19 日创建

I am working in R with a regular dataframe (df) and a shapefile (map2), the share a common column called CD116FP. df has 103552 lines while map2 has 444 .I am loading the shapefile in the following way:

map2 <- read_sf("D:/Data/tl_2019_us_cd116.shp")

My end-goal is to use the function mapview() to view the map included in map2 with the "intensity" that is described in df under the column np_scores. I hence do not want observations of df that do not appear on map2.

Here are my thoughts and failures:

  1. If these two objects were regular dataframes, a reasonable candidate would be to use merge() to combine both objects, however if you apply that function in this case, the resulting object looses the spatial properties and mapview does not know how to read it.

  2. Another approach that I used was trying this line of code:

map2m<-data.frame(map2, df[match(map2$CD116FP, df$CD116FP),])

But the result has dimensions that are too big (much bigger that 444 lines) and hence mapview crashes when trying to plot the desired map.

  1. At last, I went full-on brute force and just constructed a loop to add the column np to map2:

map2$np=10

for (i in c(1:nrow(map2)))
{  
for (j in c(1:nrow(df)))
 {
if (identical(map2$CD116FP[i],df$CD116FP[j]))
{map2$np[i]=df$np_score[j]}
else {map2$np[i]=0}  
}
}  

However, this approach just takes way too much time given the dimensions of my dataframe.

Do you have any suggestions?

解决方案

I'm a bit puzzled by the structure of your data. Your df has over 100,000 rows, so I'm guessing that the same CD116FP occurs multiple times in df, and the npscore will presumably vary across these instances. If you want to merge these to map2 you will need to aggregate them first.

Let's try to recreate a similar setup:

library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1

map2 <- read_sf("C:/users/administrator/documents/shape/tl_2019_us_cd116.shp")

set.seed(69)

df <- data.frame(CD116FP = sprintf("%02d", sample(0:99, 103552, TRUE)),
                 npscores = runif(103552))

head(df)
#>   CD116FP  npscores
#> 1      95 0.6927742
#> 2      80 0.8543845
#> 3      90 0.5220353
#> 4      01 0.1449647
#> 5      76 0.9876543
#> 6      38 0.5629950

I have made df have the same number of rows that your data has to show this solution will scale to your problem.

Let's aggregate the npscores with dplyr:

library(dplyr)
df_sum <- df %>% 
  filter(CD116FP %in% map2$CD116FP) %>%
  group_by(CD116FP) %>%
  summarise(npscores = mean(npscores))

map2$npscores <- df_sum$npscores[match(map2$CD116FP, df_sum$CD116FP)]

Now map2 has the aggregated npscores we can plot - for example, in ggplot:

library(ggplot2)

ggplot(map2) + 
  geom_sf(aes(fill = npscores)) +
  coord_sf(xlim = c(-180, -60),
            ylim = c(15, 70)) +
  scale_fill_gradient(low = "red", high = "gold")

Or in mapview:

library(mapview)
mapView(map2, zcol = "npscores")

Created on 2020-09-19 by the reprex package (v0.3.0)

这篇关于合并 Shapefile 和数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆