使用美国县级数据创建一个Choropleth地图 [英] Creating a Choropleth map with US county level data

查看:74
本文介绍了使用美国县级数据创建一个Choropleth地图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用R生成县级COVID-19感染数据的choropleth图.我是R的相对新手,所以....

我已经用ggmap完成了一些相当基础的工作,以绘制空间数据,但从来没有像现在这样.通常,我只需要在地图上叠加一些兴趣点,因此可以使用geom_point及其纬度/经度.在这种情况下,我需要构建基础地图,然后填充区域,而我在ggplot世界中很难做到这一点.

我遵循了一些在线示例,据此我可以了解到以下内容:

 库(ggplot2)图书馆(扫帚)图书馆(geojsonio)#获取县级地图geoJSON文件县<-geojson_read("https://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_050_00_500k.json",什么="sp")#过滤我们的阿拉斯加和夏威夷lower48<-县[[counties @ data $ STATE!="02"& counties @ data $ STATE!="15"),]#将其转换为ggmap的数据框新县<-整洁(lower48)#绘制它打印(ggplot()+geom_polygon(data = new_counties,aes(x = long,y = lat,group = group),fill =#69b3a2",color ="white")+theme_void()+coord_map()) 

是哪个情节?

到目前为止,一切都很好.但是我的new_counties数据框现在看起来像这样:

  head(新县)#小动作:6 x 7长经点孔片组ID< dbl>< dbl>< int>< lgl>< chr>< chr>< chr>1 -85.4 33.9 1假1 0.1 02 -85.4 33.9 2否1 0.1 03 -85.4 33.9 3假1 0.1 04 -85.4 33.9 4假1 0.1 05 -85.4 33.9 5假1 0.1 06 -85.4 33.8 6假1 0.1 0 

因此,我丢失了所有可能与县级感染数据相关联的信息.

我的数据在每个县都有一个5位数的FIPS代码.前两位数字是州,后三位是县.我的geoJSON文件包含更详细的FIPS代码.我尝试仅获取前5个并创建自己的数据元素,然后将其映射回

 库(ggplot2)图书馆(扫帚)图书馆(geojsonio)#获取县级地图geoJSON文件县<-geojson_read("https://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_050_00_500k.json",什么="sp")#过滤我们的阿拉斯加和夏威夷lower48<-县[[counties @ data $ STATE!="02"& counties @ data $ STATE!="15"),]#添加我自己的FIPS代码lower48 @ data $ myFIPS<-substr(as.character(lower48 @ data $ GEO_ID),1,5)#将其转换为ggmap的数据框new_counties<-整洁(lower48,区域="myFIPS")#绘制它打印(ggplot()+geom_polygon(data = new_counties,aes(x = long,y = lat,group = group),fill =#69b3a2",color ="white")+theme_void()+coord_map()) 

但这会产生此情节

我不得不说,我对扫帚:: tidy不太熟悉,无法确切知道原因.当我键入此内容时,我还注意到我需要过滤掉波多黎各!

如果有人可以向我指出一个有用的方向....尽管我想坚持使用ggplot2或ggmap,但我不愿屈从于当前的方法.我的老板最终要我覆盖某些功能.最终目标是遵循示例


对于动画,您可以使用 gganimate 包并在几天中过渡.这些命令与上面的命令类似,只是不应该汇总covid数据.

 库(gganimate)counties_cov<-inner_join(counties_sf,Covid,by = c("county_fips" ="countyFIPS"))p<-ggplot(counties_cov)+ ...#如上p<-p +过渡时间(感染日期)+实验室(标题=日期:{frame_time}")动画(p,end_pause = 30) 


I'm trying to produce a choropleth map of county level data on COVID-19 infections using R. I'm a relative newbie to R so....

I've done some fairly basic stuff with ggmap to plot spatial data, but never anything quite like this. Typically I just have points of interest that I need to overlay on a map, so I can use geom_point and their lat/lon. In this case I need to construct the underlying map and then fill regions and I'm struggling with doing that in the ggplot world.

I've followed some online examples I've found to get as far as this:

library(ggplot2)
library(broom)
library(geojsonio)

#get a county level map geoJSON file
counties <- geojson_read("https://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_050_00_500k.json", what = "sp")

#filter our alaska and Hawaii
lower48 <- counties[(counties@data$STATE != "02" & counties@data$STATE != "15") ,]

#turn it into a dataframe for ggmap
new_counties <- tidy(lower48)

# Plot it
print(ggplot() +
  geom_polygon(data = new_counties, aes( x = long, y = lat, group = group), fill="#69b3a2", color="white") +
  theme_void() +
  coord_map())

Which produces this plot:

So far so good. But my new_counties dataframe now looks like this:

head(new_counties)
# A tibble: 6 x 7
   long   lat order hole  piece group id  
  <dbl> <dbl> <int> <lgl> <chr> <chr> <chr>
1 -85.4  33.9     1 FALSE 1     0.1   0    
2 -85.4  33.9     2 FALSE 1     0.1   0    
3 -85.4  33.9     3 FALSE 1     0.1   0    
4 -85.4  33.9     4 FALSE 1     0.1   0    
5 -85.4  33.9     5 FALSE 1     0.1   0    
6 -85.4  33.8     6 FALSE 1     0.1   0 

So I've lost anything that I might be able to tie back to my county level data on infections.

My data has a 5-digit FIPS code for each county. First two digits are the state and last three are the county. My geoJSON file has a much more detailed FIPS code. I tried grabbing just the first 5 and creating my own data element I could map back to

library(ggplot2)
library(broom)
library(geojsonio)

#get a county level map geoJSON file
counties <- geojson_read("https://eric.clst.org/assets/wiki/uploads/Stuff/gz_2010_us_050_00_500k.json", what = "sp")

#filter our alaska and Hawaii
lower48 <- counties[(counties@data$STATE != "02" & counties@data$STATE != "15") ,]

#add my own FIPS code
lower48@data$myFIPS <- substr(as.character(lower48@data$GEO_ID),1,5)  

#turn it into a dataframe for ggmap
new_counties <- tidy(lower48, region = "myFIPS")


# Plot it
print(ggplot() +
  geom_polygon(data = new_counties, aes( x = long, y = lat, group = group), fill="#69b3a2", color="white") +
  theme_void() +
  coord_map())

But that produces this plot

And I have to say I'm not quite familiar enough with broom::tidy to know exactly why. I also notice as I type this that I need to filter out Puerto Rico!

If anybody can point me back in a useful direction....I'm not wedded to the current approach, though I would like to stick to ggplot2 or ggmap. My boss eventually wants me to overlay certain features. Ultimately the goal is to follow the example here and produce an animated map showing data over time, but I'm obviously a long way from that.

解决方案

There's many ways to do this, but the concept is basically the same: Find a map containing country level FIPS codes and use them to link with a data source, also containing the same FIPS codes as well as the variable for plotting (here the number of covid-19 cases per day).

#devtools::install_github("UrbanInstitute/urbnmapr")
library(urbnmapr) # For map
library(ggplot2)  # For map
library(dplyr)    # For summarizing
library(tidyr)    # For reshaping
library(stringr)  # For padding leading zeros


# Get COVID cases, available from:
url <- "https://static.usafacts.org/public/data/covid-19/covid_confirmed_usafacts.csv
             ?_ga=2.162130428.136323622.1585096338-408005114.1585096338"

COV <- read.csv(url, stringsAsFactors = FALSE)
names(COV)[1] <- "countyFIPS"  # Fix the name of first column. Why!?

The data are stored in wide format with daily cases per county spread across columns. This needs to be gathered before merging with the map. The dates need to be converted to proper dates. The FIPS codes are stored as integers, so these need to be converted to a character with leading 0s in order to merge with the map data. I use the urbnmap package for the map data.

Covid <- pivot_longer(COV, cols=starts_with("X"), 
                     values_to="cases",
                     names_to=c("X","date_infected"),
                     names_sep="X") %>%                
  mutate(date_infected = as.Date(date_infected, format="%m.%d.%Y"),
         countyFIPS = str_pad(as.character(countyFIPS), 5, pad="0"))

# Obtain map data for counties (to link with covid data) and states (for showing borders)
states_sf <- get_urbn_map(map = "states", sf = TRUE)
counties_sf <- get_urbn_map(map = "counties", sf = TRUE)

# Merge county map with total cases of cov
counties_cov <- inner_join(counties_sf, group_by(Covid, countyFIPS) %>%
       summarise(cases=sum(cases)), by=c("county_fips"="countyFIPS"))

counties_cov %>%
  ggplot() +
  geom_sf(mapping = aes(fill = cases), color = NA) +
  geom_sf(data = states_sf, fill = NA, color = "black", size = 0.25) +
  coord_sf(datum = NA) +   
  scale_fill_gradient(name = "Cases", trans = "log", low='pink', high='navyblue', 
                      na.value="white", breaks=c(1, max(counties_cov$cases))) +
  theme_bw() + theme(legend.position="bottom", panel.border = element_blank())



For animation, you can use the gganimate package and transition through the days. The commands are similar to above except that the covid data should not be summarized.

library(gganimate)

counties_cov <- inner_join(counties_sf, Covid, by=c("county_fips"="countyFIPS"))

p <- ggplot(counties_cov) + ... # as above

p <- p + transition_time(date_infected) +
  labs(title = 'Date: {frame_time}')

animate(p, end_pause=30)


这篇关于使用美国县级数据创建一个Choropleth地图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆