合并 Shapefile 和数据框 [英] Merging a Shapefile and a dataframe
问题描述
我在 R
中使用常规数据框 (df
) 和 shapefile (map2
),共享一个名为 <代码>CD116FP.df
有 103552 行,而 map2
有 444 行.我以下列方式加载 shapefile:
map2 <- read_sf("D:/Data/tl_2019_us_cd116.shp")
我的最终目标是使用函数 mapview()
以强度"查看包含在 map2
中的地图.这在 df
列 np_scores
下描述.因此,我不希望观察 df
没有出现在 map2
上.
以下是我的想法和失败:
如果这两个对象是常规数据帧,合理的候选对象是使用
merge()
来组合两个对象,但是如果在这种情况下应用该函数,则生成的对象会丢失空间属性和mapview
不知道如何读取它.我使用的另一种方法是尝试这行代码:
<块引用>
map2m<-data.frame(map2, df[match(map2$CD116FP, df$CD116FP),])
但结果的尺寸太大(比 444 行大得多),因此 mapview
在尝试绘制所需地图时崩溃.
- 最后,我全力以赴,构建了一个循环,将
np
列添加到map2
:
map2$np=10for (i in c(1:nrow(map2))){for (j in c(1:nrow(df))){如果(相同(map2$CD116FP[i],df$CD116FP[j])){map2$np[i]=df$np_score[j]}否则 {map2$np[i]=0}}}
然而,考虑到我的数据框的维度,这种方法需要太多时间.
您有什么建议吗?
我对你的数据结构有点困惑.您的 df
有超过 100,000 行,所以我猜相同的 CD116FP
在 df
和 npscore
可能会因这些实例而异.如果您想将这些合并到 map2
,您需要先聚合它们.
让我们尝试重新创建一个类似的设置:
库(sf)#>链接到 GEOS 3.8.0、GDAL 3.0.4、PROJ 6.3.1map2 <- read_sf("C:/users/administrator/documents/shape/tl_2019_us_cd116.shp")set.seed(69)df <- data.frame(CD116FP = sprintf("%02d", sample(0:99, 103552, TRUE)),npscores = runif(103552))头(df)#>CD116FP npscores#>1 95 0.6927742#>2 80 0.8543845#>3 90 0.5220353#>4 01 0.1449647#>5 76 0.9876543#>6 38 0.5629950
我已经使 df
具有与您的数据必须显示的行数相同的行数,以表明此解决方案将适用于您的问题.
让我们用 dplyr
聚合 npscores
:
库(dplyr)df_sum<-df%>%过滤器(CD116FP %in% map2$CD116FP)%>%group_by(CD116FP)%>%总结(npscores = mean(npscores))map2$npscores <- df_sum$npscores[匹配(map2$CD116FP, df_sum$CD116FP)]
现在 map2
具有我们可以绘制的聚合 npscores
- 例如,在 ggplot 中:
库(ggplot2)ggplot(地图2)+geom_sf(aes(fill = npscores)) +coord_sf(xlim = c(-180, -60),ylim = c(15, 70)) +scale_fill_gradient(低=红色",高=金色")
或在地图视图中:
library(mapview)地图视图(地图2,zcol =npscores")
由 reprex 包 (v0.3.0) 于 2020 年 9 月 19 日创建
I am working in R
with a regular dataframe (df
) and a shapefile (map2
), the share a common column called CD116FP
. df
has 103552 lines while map2
has 444 .I am loading the shapefile in the following way:
map2 <- read_sf("D:/Data/tl_2019_us_cd116.shp")
My end-goal is to use the function mapview()
to view the map included in map2
with the "intensity" that is described in df
under the column np_scores
. I hence do not want observations of df
that do not appear on map2
.
Here are my thoughts and failures:
If these two objects were regular dataframes, a reasonable candidate would be to use
merge()
to combine both objects, however if you apply that function in this case, the resulting object looses the spatial properties andmapview
does not know how to read it.Another approach that I used was trying this line of code:
map2m<-data.frame(map2, df[match(map2$CD116FP, df$CD116FP),])
But the result has dimensions that are too big (much bigger that 444 lines) and hence mapview
crashes when trying to plot the desired map.
- At last, I went full-on brute force and just constructed a loop to add the column
np
tomap2
:
map2$np=10
for (i in c(1:nrow(map2)))
{
for (j in c(1:nrow(df)))
{
if (identical(map2$CD116FP[i],df$CD116FP[j]))
{map2$np[i]=df$np_score[j]}
else {map2$np[i]=0}
}
}
However, this approach just takes way too much time given the dimensions of my dataframe.
Do you have any suggestions?
I'm a bit puzzled by the structure of your data. Your df
has over 100,000 rows, so I'm guessing that the same CD116FP
occurs multiple times in df
, and the npscore
will presumably vary across these instances. If you want to merge these to map2
you will need to aggregate them first.
Let's try to recreate a similar setup:
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
map2 <- read_sf("C:/users/administrator/documents/shape/tl_2019_us_cd116.shp")
set.seed(69)
df <- data.frame(CD116FP = sprintf("%02d", sample(0:99, 103552, TRUE)),
npscores = runif(103552))
head(df)
#> CD116FP npscores
#> 1 95 0.6927742
#> 2 80 0.8543845
#> 3 90 0.5220353
#> 4 01 0.1449647
#> 5 76 0.9876543
#> 6 38 0.5629950
I have made df
have the same number of rows that your data has to show this solution will scale to your problem.
Let's aggregate the npscores
with dplyr
:
library(dplyr)
df_sum <- df %>%
filter(CD116FP %in% map2$CD116FP) %>%
group_by(CD116FP) %>%
summarise(npscores = mean(npscores))
map2$npscores <- df_sum$npscores[match(map2$CD116FP, df_sum$CD116FP)]
Now map2
has the aggregated npscores
we can plot - for example, in ggplot:
library(ggplot2)
ggplot(map2) +
geom_sf(aes(fill = npscores)) +
coord_sf(xlim = c(-180, -60),
ylim = c(15, 70)) +
scale_fill_gradient(low = "red", high = "gold")
Or in mapview:
library(mapview)
mapView(map2, zcol = "npscores")
Created on 2020-09-19 by the reprex package (v0.3.0)
这篇关于合并 Shapefile 和数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!