在着色状态[R] [英] Error in coloring us state [R]

查看:155
本文介绍了在着色状态[R]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试建立「美国」地图,除以州(不含阿拉斯加和夏威夷)。每个国家应根据一个简单的标准着色。
我有一个包含所有州的数据集和一个表示投资的值。这是我的数据的第一个原因:

I'm trying to create an Us map divided by States (without Alaska and Hawaii). Each State should be colored based on a simple criterion. I've a data set with all the States and a value indicating the investments. This are the first raws of my data:

         states investment   
    1      AL    5500000  
    2      AR    5000000  
    3      AZ   54947100 
    4      CA 3285330900 
    5      CO  135520000




  • 如果投资等于0 (表示数据集的缺失值),则相应的州应该以白色着色。

  • 如果投资大于0且小于5500000,则对应的
    国家应以蓝色显示。

  • 大于5500000,对应的状态
    应以绿色显示。

    • If the investment is equal to 0 (indicating a missing value of the data set), the corresponding State should be colored in white.
    • If the investment is greater than 0 and less than 5500000, the corresponding State should be colored in blue.
    • If the investment is greater than 5500000, the corresponding State should be colored in green.
    • 我的数据集位于 excel 文件,因此我已使用 XLConnetc 包在 R 中加载数据。然后我创建了一个脚本,它创建了一个新列来存储颜色。

      My data set is on an excel file, so I've used the XLConnetc package to load data in R. Then I've created a script which create a new column to store the colors

       dati["col"] <- NA
            for (i in 1:48){
             if(dati$investment[i] >0 && dati$investment[i] <= 5500000){
             dati$col[i] <- "blue"
             }
                 if(dati$investment[i] > 5500000){
             dati$col[i] <- "green"
             }
             if(dati$investment[i] == 0){
             dati$col[i] <- "white"
             } 
            }
      

      我的新资料集现在是这样:

      my new data set is now this:

            states investment   col
       1      AL    5500000    blue
       2      AR    5000000    blue
       3      AZ   54947100    green
       4      CA 3285330900    green
      

      现在,我使用新列(称为 dati $ col ,以便为我的地图着色。为了创建映射我使用

      Now, I use the new column (called dati$col) in order to color my map. To create the map I use

       map("state", lty=1, lwd=1, fill=TRUE, boundary=TRUE, col = dati$col)
      

      我注意到了地图的一些问题。例如:格鲁吉亚应该是绿色的,而在我的地图是蓝色的;或南卡罗来纳州应该是绿色的,而在地图上是白色的

      I've noticed some problems with the map. For example: Georgia should be green, instead in my map is blue; or South Carolina should be green, instead in the map is white

            states investment   col
       9      GA   46008000    green
       38     SC   14000000    green
      

      这只是错误颜色匹配的两个示例。
      你对我的错误有什么建议吗?

      This are only 2 examples of wrong color matching. Do you have any advice on what I might have been wrong?

      推荐答案

      问题是 state R中的内置数据库有63个多边形,而你的 dati 数据框只有50个(或更少)行。因此,当您使用 col = dati $ col 时,会循环使用 dati $ col 。不仅如此, dati 中的行的顺序是按状态缩写的字母顺序,而状态中的多边形的顺序数据库按字母顺序按州名称(或多或少)。

      The problem is that the state built-in database in R has 63 polygons, whereas your dati data frame has only 50 (or fewer??) rows. So when you use col=dati$col R recycles dati$col when it runs out of rows. Not only that, the order of the rows in dati is alphabetical by state abbreviation, whereas the order of the polygons in the state database is alphabetical by state name (more or less). So the fact that you got as much agreement as you did is purely by accident.

      polys <- map('state',plot=F,namesonly=T)
      length(polys)
      # [1] 63
      head(polys,5)
      # [1] "alabama"     "arizona"     "arkansas"    "california"  "colorado"
      

      请注意, dati 是:AL,AR,AZ,这是与前三个多边形不同的顺序。

      Note that the first 3 rows of dati are: AL, AR, AZ, which is a different order from the first three polygons.

      那么为什么有63个多边形?一些状态具有(大)岛,其被视为分离的多边形。这创建了一个新问题,因为具有多个多边形的状态的名称是非标准的。例如:

      So why are there 63 polygons?? Some states have (large) islands, which are treated as separate polygons. This creates a new problem, because the "name" of states with multiple polygons is non-standard. For example:

      polys[substr(polys,1,8)=="new york"]
      # [1] "new york:manhattan" "new york:main" "new york:staten island" "new york:long island"  
      

      因此,要创建合并字段,您需要解析这些奇怪的名称。

      So to create a merge field you need to parse these odd names.

      一种方法是创建一个数据帧,在状态数据库中的多边形,基于公共字段与 dati 合并,将结果放入原始顺序,并使用它为颜色。这是一个巨大的头痛。

      One way to do this is to create a dataframe with rows for all the polygons in the state database, merge that with dati based on a common field, resort the result into the original order, and use that for the colors. It's a massive headache.

      library(maps)
      # example only: create df with state abbr, name, and population
      dati <- data.frame(state=state.abb, 
                         name=tolower(state.name),
                         population=state.x77[,"Population"],
                         stringsAsFactors=F)
      dati[dati$population<1000,]$population <- 0  # artificial zeros
      # color by population, similar to OP's use case
      dati$col                          <- "green" # most populous
      dati[dati$population < 5000,]$col <- "blue"  # moderately populous
      dati[dati$population == 0,]$col   <- "white" # least populous
      
      polygons      <- data.frame(polyName=map("state",plot=F,namesonly=T))
      polygons$id   <- 1:nrow(polygons)           # need this to restore original order
      polygons$name <- gsub("(:+[a-z\ \']+)","",polygons$polyName)
      polygons <- merge(polygons,dati, all.x=T)   # append color info
      polygons <- polygons[order(polygons$id),]   # restore original order
      map('state',fill=T,col=polygons$col)
      

      这正是我推荐使用包含 rgdal 包的实际shape文件的原因, $ c> ggplot 。

      This is precisely the reason I recommend using actual shapefiles with the rgdal package, and plotting with ggplot.

      这篇关于在着色状态[R]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆