地图,ggplot2,填写状态缺少地图上的某些区域 [英] Maps, ggplot2, fill by state is missing certain areas on the map

查看:131
本文介绍了地图,ggplot2,填写状态缺少地图上的某些区域的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 maps ggplot2 来显示不同年份中每个州的某些犯罪数量。我正在使用的数据集由FBI制作,可以从他们的网站下载或从这里(如果你不想下载数据集,我不会责怪你,但它太大而不能复制并粘贴到这个问题中,并且包括一小部分数据集不会帮助,因为没有足够的信息来重新创建图表)。



这个问题比描述的要容易一些。



当你可以看到加利福尼亚州缺少一大块以及其他一些州。下面是产生这个图的代码:

 #load libraries 
library(maps)
library ggplot2)

#加载数据
fbi< - read.csv(http://www.hofroe.net/stat579/crimes-2012.csv)
fbi < - subset(fbi,state!=United States)
states < - map_data(state)

#按地区合并数据集
fbi $ region < - tolower(fbi $ state)
fbimap< - merge(fbi,states,by =region)

#绘制2012年各州的抢劫数字
fbimap12< - subset(fbimap,Year == 2012)
qplot(long,lat,geom =polygon,data = fbimap12,
facets =〜Year,fill = Robbery,group = group)

这就是状态数据的样子:

  long lat group order region subregion 
1 -87.46201 30.38968 1 1 alabama< NA>
2 -87.48493 30.37249 1 2 alabama< NA>
3 -87.52503 30.37249 1 3 alabama< NA>
4 -87.53076 30.33239 1 4 alabama< NA>
5 -87.57087 30.32665 1 5 alabama< NA>
6 -87.58806 30.32665 1 6 alabama< NA>

这就是 fbi 数据的外观如:

 年份人口暴力财产谋杀强盗抢劫抢劫
1 1960 3266740 6097 33823 406 281 898
2 1961 3302000 5564 32541 427 252 630
3 1962 3358000 5283 35829 316 218 754
4 1963 3347000 6115 38521 340 192 828
5 1964 3407000 7260 46290 316 397 992
6 1965年3462000 6916 48215 395 367 992
加重犯.Assault Burglary盗窃汽车盗窃罪abbr州地区
1 4512 11626 19344 2853 AL阿拉巴马州alabama
2 4255 11205 18801 2535 AL阿拉巴马州alabama
3 3995 11722 21306 2801 AL阿拉巴马州alabama
4 4755 12614 22874 3033 AL阿拉巴马州alabama
5 5555 15898 26713 3679 AL阿拉巴马州alabama
6 5162 16398 28115 3702 AL阿拉巴马州alabama

然后我沿着 region 合并这两个集合。我试图绘制的子集是

 地区年度抢劫长组
8283 alabama 2012 5020 -87.46201 30.38968 1
8284 alabama 2012 5020 -87.48493 30.37249 1
8285 alabama 2012 5020 -87.95475 30.24644 1
8286 alabama 2012 5020 -88.00632 30.24071 1
8287 alabama 2012 5020 -88.01778 30.25217 1
8288 alabama 2012 5020 -87.52503 30.37249 1
... ... ... ...

关于如何在没有这些丑陋的缺失点的情况下创建这个情节的任何想法?

。我可以告诉的一件事是,当你使用 merge 发生了一些事情。我使用 geom_path 绘制了状态图,并确认有一些原始地图数据中不存在的奇怪行。然后,我通过玩 merge inner_join 来进一步调查此案例。 merge inner_join 在这里做着同样的工作。但是,我发现一个区别。当我使用 merge 时,订单已更改;这些数字并没有按照正确的顺序排列。对于 inner_join ,情况并非如此。您会在下面看到加州的一些数据。你的方法是对的。但 merge 某种程度上对你不利。但我不确定为什么该功能改变了顺序。

  library(dplyr)

###调用美国地图多边形
states< - map_data(州)

###获取犯罪数据
fbi < - read.csv(http://www.hofroe.net/stat579/crimes-2012。 (美元)
fbi $ - $($)$ f $ $ $ $



$ b ###检查两个文件是否具有相同的州名:答案是NO
### states $ region没有阿拉斯加州,夏威夷州和华盛顿特区
### fbi $ state没有哥伦比亚特区。

setdiff(fbi $ state,states $ region)
#[1]alaskahawaiiwashington d。c。

setdiff(州$ region,fbi $州)
#[1]哥伦比亚地区

###选择2012年的数据并选择两列(即状态和抢劫)
fbi2 < - fbi%>%
过滤器(年== 2012)%>%
选择(状态,抢劫)

现在我用 merge 和<$ c $创建了两个数据框c> inner_join 。

  ###使用合并和inner_join创建两个数据框
ana < - merge(fbi2,states,by.x =state,by.y =region)
bob < - inner_join(fbi2,states,by = c(state=region ))

ana%>%
filter(state ==california)%>%
slice(1:5)

#状态劫案长组子订单子区域
#1加州56521 -119.8685 38.90956 4 676
#2 california 56521 -119.5706 38.69757 4 677
#3加州56521 -119.3299 38.53141 4 678
#4 california 56521 -120.0060 42.00927 4 667< NA>
#5加州56521 -120.0060 41.20139 4 668

bob%>%
filter(state ==california)%>%
slice(1:5)

#state抢劫长组子订单子区域
#1加州56521 -120.0060 42.00927 4 667< NA>
#2 california 56521 -120.0060 41.20139 4 668
#3 california 56521 -120.0060 39.70024 4 669
#4 california 56521 -119.9946 39.44241 4 670< NA>
#5加州56521 -120.0060 39.31636 4 671
$ b ggplot(data = bob,aes(x = long,y = lat,fill = Robbery,group = group))+
geom_polygon()


I am working with maps and ggplot2 to visualize the number of certain crimes in each state for different years. The data set that I am working with was produced by the FBI and can be downloaded from their site or from here (if you don't want to download the dataset I don't blame you, but it is way too big to copy and paste into this question, and including a fraction of the data set wouldn't help, as there wouldn't be enough information to recreate the graph).

The problem is easier seen than described.

As you can see California is missing a large chunk as well as a few other states. Here is the code that produced this plot:

# load libraries
library(maps)
library(ggplot2)

# load data
fbi <- read.csv("http://www.hofroe.net/stat579/crimes-2012.csv")
fbi <- subset(fbi, state != "United States")
states <- map_data("state")

# merge data sets by region
fbi$region <- tolower(fbi$state)
fbimap <- merge(fbi, states, by="region")

# plot robbery numbers by state for year 2012
fbimap12 <- subset(fbimap, Year == 2012)
qplot(long, lat, geom="polygon", data=fbimap12,
  facets=~Year, fill=Robbery, group=group)

This is what the states data looks like:

    long      lat     group order  region subregion
1 -87.46201 30.38968     1     1 alabama      <NA>
2 -87.48493 30.37249     1     2 alabama      <NA>
3 -87.52503 30.37249     1     3 alabama      <NA>
4 -87.53076 30.33239     1     4 alabama      <NA>
5 -87.57087 30.32665     1     5 alabama      <NA>
6 -87.58806 30.32665     1     6 alabama      <NA>

And this is what the fbi data looks like:

    Year Population Violent Property Murder Forcible.Rape Robbery
1 1960    3266740    6097    33823    406           281     898
2 1961    3302000    5564    32541    427           252     630
3 1962    3358000    5283    35829    316           218     754
4 1963    3347000    6115    38521    340           192     828
5 1964    3407000    7260    46290    316           397     992
6 1965    3462000    6916    48215    395           367     992
   Aggravated.Assault Burglary Larceny.Theft Vehicle.Theft abbr   state region
1               4512    11626         19344          2853   AL Alabama  alabama
2               4255    11205         18801          2535   AL Alabama  alabama
3               3995    11722         21306          2801   AL Alabama  alabama
4               4755    12614         22874          3033   AL Alabama  alabama
5               5555    15898         26713          3679   AL Alabama  alabama
6               5162    16398         28115          3702   AL Alabama  alabama

I then merged the two sets along region. The subset I am trying to plot is

      region Year Robbery      long      lat group
8283 alabama 2012    5020 -87.46201 30.38968     1
8284 alabama 2012    5020 -87.48493 30.37249     1
8285 alabama 2012    5020 -87.95475 30.24644     1
8286 alabama 2012    5020 -88.00632 30.24071     1
8287 alabama 2012    5020 -88.01778 30.25217     1
8288 alabama 2012    5020 -87.52503 30.37249     1
       ...            ...    ...      ...

Any ideas on how I can create this plot without those ugly missing spots?

解决方案

I played with your code. One thing I can tell is that when you used merge something happened. I drew states map using geom_path and confirmed that there were a couple of weird lines which do not exist in the original map data. I, then, further investigated this case by playing with merge and inner_join. merge and inner_join are doing the same job here. However, I found a difference. When I used merge, order changed; the numbers were not in the right sequence. This was not the case with inner_join. You will see a bit of data with California below. Your approach was right. But merge somehow did not work in your favour. I am not sure why the function changed order, though.

library(dplyr)

### Call US map polygon
states <- map_data("state")

### Get crime data
fbi <- read.csv("http://www.hofroe.net/stat579/crimes-2012.csv")
fbi <- subset(fbi, state != "United States")
fbi$state <- tolower(fbi$state)


### Check if both files have identical state names: The answer is NO
### states$region does not have Alaska, Hawaii, and Washington D.C.
### fbi$state does not have District of Columbia.

setdiff(fbi$state, states$region)
#[1] "alaska"           "hawaii"           "washington d. c."

setdiff(states$region, fbi$state)
#[1] "district of columbia"

### Select data for 2012 and choose two columns (i.e., state and Robbery)
fbi2 <- fbi %>%
        filter(Year == 2012) %>%
        select(state, Robbery)  

Now I created two data frames with merge and inner_join.

### Create two data frames with merge and inner_join
ana <- merge(fbi2, states, by.x = "state", by.y = "region")
bob <- inner_join(fbi2, states, by = c("state" ="region"))

ana %>%
    filter(state == "california") %>%
    slice(1:5)

#        state Robbery      long      lat group order subregion
#1  california   56521 -119.8685 38.90956     4   676      <NA>
#2  california   56521 -119.5706 38.69757     4   677      <NA>
#3  california   56521 -119.3299 38.53141     4   678      <NA>
#4  california   56521 -120.0060 42.00927     4   667      <NA>
#5  california   56521 -120.0060 41.20139     4   668      <NA>

bob %>%
    filter(state == "california") %>%
    slice(1:5)

#        state Robbery      long      lat group order subregion
#1  california   56521 -120.0060 42.00927     4   667      <NA>
#2  california   56521 -120.0060 41.20139     4   668      <NA>
#3  california   56521 -120.0060 39.70024     4   669      <NA>
#4  california   56521 -119.9946 39.44241     4   670      <NA>
#5  california   56521 -120.0060 39.31636     4   671      <NA>

ggplot(data = bob, aes(x = long, y = lat, fill = Robbery, group = group)) +
geom_polygon()

这篇关于地图,ggplot2,填写状态缺少地图上的某些区域的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆