地图,ggplot2,按状态填充缺少地图上的某些区域 [英] Maps, ggplot2, fill by state is missing certain areas on the map

查看:28
本文介绍了地图,ggplot2,按状态填充缺少地图上的某些区域的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 mapsggplot2 来可视化每个州不同年份的特定犯罪数量.我正在使用的数据集由 FBI 制作,可以从他们的网站或从 here(如果您不想下载数据集,我不怪您,但是复制并粘贴到这个问题中太大了,并且包含一小部分数据集不会帮助,因为没有足够的信息来重新创建图表).

I am working with maps and ggplot2 to visualize the number of certain crimes in each state for different years. The data set that I am working with was produced by the FBI and can be downloaded from their site or from here (if you don't want to download the dataset I don't blame you, but it is way too big to copy and paste into this question, and including a fraction of the data set wouldn't help, as there wouldn't be enough information to recreate the graph).

问题比描述更容易看到.

The problem is easier seen than described.

正如您所看到的,加利福尼亚州和其他几个州都缺少很大一部分.这是生成此图的代码:

As you can see California is missing a large chunk as well as a few other states. Here is the code that produced this plot:

# load libraries
library(maps)
library(ggplot2)

# load data
fbi <- read.csv("http://www.hofroe.net/stat579/crimes-2012.csv")
fbi <- subset(fbi, state != "United States")
states <- map_data("state")

# merge data sets by region
fbi$region <- tolower(fbi$state)
fbimap <- merge(fbi, states, by="region")

# plot robbery numbers by state for year 2012
fbimap12 <- subset(fbimap, Year == 2012)
qplot(long, lat, geom="polygon", data=fbimap12,
  facets=~Year, fill=Robbery, group=group)

states 数据如下所示:

    long      lat     group order  region subregion
1 -87.46201 30.38968     1     1 alabama      <NA>
2 -87.48493 30.37249     1     2 alabama      <NA>
3 -87.52503 30.37249     1     3 alabama      <NA>
4 -87.53076 30.33239     1     4 alabama      <NA>
5 -87.57087 30.32665     1     5 alabama      <NA>
6 -87.58806 30.32665     1     6 alabama      <NA>

这就是 fbi 数据的样子:

And this is what the fbi data looks like:

    Year Population Violent Property Murder Forcible.Rape Robbery
1 1960    3266740    6097    33823    406           281     898
2 1961    3302000    5564    32541    427           252     630
3 1962    3358000    5283    35829    316           218     754
4 1963    3347000    6115    38521    340           192     828
5 1964    3407000    7260    46290    316           397     992
6 1965    3462000    6916    48215    395           367     992
   Aggravated.Assault Burglary Larceny.Theft Vehicle.Theft abbr   state region
1               4512    11626         19344          2853   AL Alabama  alabama
2               4255    11205         18801          2535   AL Alabama  alabama
3               3995    11722         21306          2801   AL Alabama  alabama
4               4755    12614         22874          3033   AL Alabama  alabama
5               5555    15898         26713          3679   AL Alabama  alabama
6               5162    16398         28115          3702   AL Alabama  alabama

然后我沿着 region 合并了这两个集合.我试图绘制的子集是

I then merged the two sets along region. The subset I am trying to plot is

      region Year Robbery      long      lat group
8283 alabama 2012    5020 -87.46201 30.38968     1
8284 alabama 2012    5020 -87.48493 30.37249     1
8285 alabama 2012    5020 -87.95475 30.24644     1
8286 alabama 2012    5020 -88.00632 30.24071     1
8287 alabama 2012    5020 -88.01778 30.25217     1
8288 alabama 2012    5020 -87.52503 30.37249     1
       ...            ...    ...      ...

关于如何在没有那些丑陋的缺失点的情况下创建这个情节的任何想法?

Any ideas on how I can create this plot without those ugly missing spots?

推荐答案

我玩过你的代码.我可以说的一件事是,当您使用 merge 时,发生了一些事情.我使用 geom_path 绘制了州地图,并确认原始地图数据中不存在几条奇怪的线条.然后,我通过使用 mergeinner_join 进一步调查了这个案例.mergeinner_join 在这里做同样的工作.但是,我发现了不同之处.当我使用 merge 时,顺序改变了;数字的顺序不正确.inner_join 不是这种情况.您将在下面看到一些与加利福尼亚有关的数据.你的做法是对的.但是 merge 不知何故对您不利.不过,我不确定为什么该函数更改了顺序.

I played with your code. One thing I can tell is that when you used merge something happened. I drew states map using geom_path and confirmed that there were a couple of weird lines which do not exist in the original map data. I, then, further investigated this case by playing with merge and inner_join. merge and inner_join are doing the same job here. However, I found a difference. When I used merge, order changed; the numbers were not in the right sequence. This was not the case with inner_join. You will see a bit of data with California below. Your approach was right. But merge somehow did not work in your favour. I am not sure why the function changed order, though.

library(dplyr)

### Call US map polygon
states <- map_data("state")

### Get crime data
fbi <- read.csv("http://www.hofroe.net/stat579/crimes-2012.csv")
fbi <- subset(fbi, state != "United States")
fbi$state <- tolower(fbi$state)


### Check if both files have identical state names: The answer is NO
### states$region does not have Alaska, Hawaii, and Washington D.C.
### fbi$state does not have District of Columbia.

setdiff(fbi$state, states$region)
#[1] "alaska"           "hawaii"           "washington d. c."

setdiff(states$region, fbi$state)
#[1] "district of columbia"

### Select data for 2012 and choose two columns (i.e., state and Robbery)
fbi2 <- fbi %>%
        filter(Year == 2012) %>%
        select(state, Robbery)  

现在我用 mergeinner_join 创建了两个数据框.

Now I created two data frames with merge and inner_join.

### Create two data frames with merge and inner_join
ana <- merge(fbi2, states, by.x = "state", by.y = "region")
bob <- inner_join(fbi2, states, by = c("state" ="region"))

ana %>%
    filter(state == "california") %>%
    slice(1:5)

#        state Robbery      long      lat group order subregion
#1  california   56521 -119.8685 38.90956     4   676      <NA>
#2  california   56521 -119.5706 38.69757     4   677      <NA>
#3  california   56521 -119.3299 38.53141     4   678      <NA>
#4  california   56521 -120.0060 42.00927     4   667      <NA>
#5  california   56521 -120.0060 41.20139     4   668      <NA>

bob %>%
    filter(state == "california") %>%
    slice(1:5)

#        state Robbery      long      lat group order subregion
#1  california   56521 -120.0060 42.00927     4   667      <NA>
#2  california   56521 -120.0060 41.20139     4   668      <NA>
#3  california   56521 -120.0060 39.70024     4   669      <NA>
#4  california   56521 -119.9946 39.44241     4   670      <NA>
#5  california   56521 -120.0060 39.31636     4   671      <NA>

ggplot(data = bob, aes(x = long, y = lat, fill = Robbery, group = group)) +
geom_polygon()

这篇关于地图,ggplot2,按状态填充缺少地图上的某些区域的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆