R ggplot2 与 shapefile 和 csv 数据合并以填充多边形 [英] R ggplot2 merge with shapefile and csv data to fill polygons

查看:27
本文介绍了R ggplot2 与 shapefile 和 csv 数据合并以填充多边形的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们每天制作地图,显示我们地区 30 个不同区域的温度计算水平,每个区域根据水平填充不同的颜色.这个地图看起来像

现在我想将地图生成切换到 R.我已经下载了省和市的边界(你可以找到

<块引用>

level=read.csv("levels.dat",header=T,sep="")
munlevel=merge(muni.df,level,by="CODINE")

但它给出了错误

<块引用>

错误 en fix.by(by.x, x) : 'by' 必须指定唯一有效的列

我对 shapefile 不熟悉,也许我需要了解更多关于 shp 数据属性的信息才能找到合并两个数据集的正确选择.我如何合并数据以便我可以绘制线条(城市边界),然后用水平填充它?

解决方案

[注意:这个问题是一个多月前提出的,所以 OP 可能已经找到了一种不同的方法来解决他们的问题.我在处理这个相关问题时偶然发现了它.包含此答案是希望对其他人有所帮助.]

这似乎是 OP 要求的......

... 并使用以下代码生成:

require("rgdal")需要(地图工具")需要(ggplot2")要求(plyr")# 读取温度数据setwd("<位置,如果你的数据文件>")temp.data <- read.csv(file = "levels.dat", header=TRUE, sep="", na.string="NA", dec=".", strip.white=TRUE)temp.data$CODINE <- str_pad(temp.data$CODINE, width = 5, side = 'left', pad = '0')# 读取自治市多边形setwd("<你的shapefile的位置")esp <- readOGR(dsn=".", layer="poligonos_municipio_etrs89")muni <- 子集(esp,esp$PROVINCIA == "46" | esp$PROVINCIA == "12" | esp$PROVINCIA == "3")# 强化和合并:在ggplot中使用muni.dfmuni@data$id <- rownames(muni@data)muni.df <- 强化(muni)muni.df <- join(muni.df, muni@data, by="id")muni.df <-合并(muni.df, temp.data, by.x="CODIGOINE", by.y="CODINE", all.x=T, a..ly=F)# 创建地图层ggp <- ggplot(data=muni.df, aes(x=long, y=lat, group=group))ggp <- ggp + geom_polygon(aes(fill=LEVEL)) # 绘制多边形ggp <- ggp + geom_path(color="grey", linestyle=2) # 绘制边界ggp <- ggp + coord_equal()ggp <- ggp + scale_fill_gradient(low = "#ffffcc", high = "#ff4444",空间 = "实验室", na.value = "grey50",指南 = "彩条")ggp <- ggp + labs(title="温度水平:Comunitat Valenciana")# 渲染地图打印(ggp)

说明:

使用 readOGR(...) 导入 R 的形状文件属于 SpacialDataFrame 类型,有两个主要部分:一个 ploygon 部分,其中包含每个多边形上所有点的坐标,以及一个 data 部分,其中包含有关每个多边形的信息(因此,每个多边形一行).这些可以被引用,例如,使用 muni@polygonsmuni@data.实用函数 fortify(...) 将多边形部分转换为一个数据框,用于使用 ggplot 进行绘图.所以基本的工作流程是:

[1] 导入温度数据文件(temp.data)[2] 导入自治市(muni)的多边形shapefile[3] 将 muni 多边形转换为数据框进行绘图(muni.df <- fortify(...))[4] 将 muni@data 中的列连接到 muni.df[5] 将 temp.data 中的列连接到 muni.df[6] 制作情节

连接必须在公共字段上完成,这就是大多数问题的所在.原始 shapefile 中的每个多边形都有一个唯一的 ID 属性.在 shapefile 上运行 fortify(...) 会创建一个列,id,它基于此.但是数据部分没有ID列.相反,多边形 ID 存储为行名称.所以首先我们必须向 muni@data 添加一个 id 列,如下所示:

muni@data$id <- rownames(muni@data)

现在我们在 muni@data 中有一个 id 字段,在 muni.df 中有一个对应的 id 字段,所以我们可以进行连接:

muni.df <- join(muni.df, muni@data, by="id")

要创建地图,我们需要根据温度级别设置填充颜色.为此,我们需要将 LEVEL 列从 temp.data 连接到 muni.df.在 temp.data 中有一个字段 CODINE 用于标识市政当局.现在,muni.df 中也有相应的字段CODIGOINE.但是有一个问题:CODIGOINEchar(5),带有前导零,而 CODINE 是整数,这意味着缺少前导零(从Excel,也许?).因此,仅加入这两个字段不会产生匹配项.我们必须首先将 CODINE 转换为带前导零的 char(5):

temp.data$CODINE <- str_pad(temp.data$CODINE, width = 5, side = 'left', pad = '0')

现在我们可以根据对应的字段将temp.dat加入到muni.df中.

muni.df <- merge(muni.df, temp.data, by.x="CODIGOINE", by.y="CODINE", all.x=T, a..ly=F)

我们使用 merge(...) 而不是 join(...) 因为连接字段有不同的名称和 join(...) 要求它们具有相同的名称.(请注意,join(...) 更快,应尽可能使用).所以,最后,我们有一个数据框,其中包含绘制多边形和温度 LEVEL 的所有信息,可用于为每个多边形建立填充颜色.

关于 OP 原始代码的一些说明:

  1. OP 的第一张地图(顶部的绿色地图)标识了我们地区的 30 个不同区域......".我找不到识别这些区域的 shapefile.市政文件确定了 543 个市政当局,我认为无法将它们分为 30 个区域.此外,温度级别文件有 542 行,每个自治市(或多或少)各一个.

  2. OP 正在为市政当局导入线形文件以绘制边界.您不需要它,因为 geom_polygon(...) 将绘制(并填充)多边形,而 geom_path(...) 将绘制边界.

We daily produce maps that show a calculated level for temperature in 30 distinct areas of our region, each area is filled with a different colour depending on the level. This maps look like

Now I want to switch map generation to R. I've downloaded provincial and municipal boundaries (you can find boundaries for whole Spain or here the subset for my region) and managed to plot them with ggplot2 following Hadley's example.

I can also produce an ascii file that contains two columns: identifier (CODINE) and daily level. You can download here.

This is my first script attempting to plot shapefiles with R and ggplot2 so there may be mistakes and for sure it can be improved, suggestions welcome. The following code (based on Hadley's previously mentioned) works for me:

> require("rgdal")
> require("maptools")
> require("ggplot2")
> require("plyr")

# Reading municipal boundaries

esp = readOGR(dsn=".", layer="lineas_limite_municipales_etrs89")

muni=subset(esp, esp$PROV1 == "46" | esp$PROV1 == "12" | esp$PROV1 == "3")
muni@data$id = rownames(muni@data)
muni.points = fortify(muni, region="id")
muni.df = join(muni.points, muni@data, by="id")

# Reading province boundaries

prov = readOGR(dsn=".", layer="poligonos_provincia_etrs89")

pr=subset(prov, prov$CODINE == "46" | prov$CODINE == "12" | prov$CODINE == "03" )
pr@data$id = rownames(pr@data)
pr.points = fortify(pr, region="id")
pr.df = join(pr.points, pr@data, by="id")

ggplot(muni.df) + aes(long,lat,group=group) + geom_path(color="blue") +
+ coord_equal()+ geom_path(data=pr.df, + 
aes(x=long, y=lat, group=group),color="red", size=0.5) 

This code plots a nice map with all the boundaries

For polygon filling by level I tried to read and then merge as suggested in http://tormodboe.wordpress.com/2011/02/22/g%C3%B8y-med-kart-2/

level=read.csv("levels.dat",header=T,sep=" ")
munlevel=merge(muni.df,level,by="CODINE")

but it gives an error

Error en fix.by(by.x, x) : 'by' must specify a uniquely valid column

I am not familiar with shapefiles, maybe I need to learn more on shp data attributes to find the right choice to merge both data sets. How can I merge data so I can plot the lines (municipal boundaries) and then fill it with levels?

解决方案

[NB: This question was asked over a month ago so OP has probably found a different way to solve their problem. I stumbled upon it while working on this related question. This answer is included in hopes it will benefit someone else.]

This appears to be what OP is asking for...

... and was produced with the following code:

require("rgdal")
require("maptools")
require("ggplot2")
require("plyr")

# read temperature data
setwd("<location if your data file>")
temp.data        <- read.csv(file = "levels.dat", header=TRUE, sep=" ", na.string="NA", dec=".", strip.white=TRUE)
temp.data$CODINE <- str_pad(temp.data$CODINE, width = 5, side = 'left', pad = '0')

# read municipality polygons
setwd("<location of your shapefile")
esp     <- readOGR(dsn=".", layer="poligonos_municipio_etrs89")
muni    <- subset(esp, esp$PROVINCIA == "46" | esp$PROVINCIA == "12" | esp$PROVINCIA == "3")
# fortify and merge: muni.df is used in ggplot
muni@data$id <- rownames(muni@data)
muni.df <- fortify(muni)
muni.df <- join(muni.df, muni@data, by="id")
muni.df <- merge(muni.df, temp.data, by.x="CODIGOINE", by.y="CODINE", all.x=T, a..ly=F)
# create the map layers
ggp <- ggplot(data=muni.df, aes(x=long, y=lat, group=group)) 
ggp <- ggp + geom_polygon(aes(fill=LEVEL))         # draw polygons
ggp <- ggp + geom_path(color="grey", linestyle=2)  # draw boundaries
ggp <- ggp + coord_equal() 
ggp <- ggp + scale_fill_gradient(low = "#ffffcc", high = "#ff4444", 
                                 space = "Lab", na.value = "grey50",
                                 guide = "colourbar")
ggp <- ggp + labs(title="Temperature Levels: Comunitat Valenciana")
# render the map
print(ggp)

Explanation:

Shapefiles imported into R with readOGR(...) are of type SpacialDataFrame and have two main sections: a ploygon section which contains the coordinates of all the points on each polygon, and a data section which contains information about each polygon (so, one row per polygon). These can be referenced, e.g., using muni@polygons and muni@data. The utility function fortify(...) converts the polygon section to a data frame organized for plotting with ggplot. So the basic workflow is:

[1] Import temperature data file (temp.data)
[2] Import polygon shapefile of municipalities (muni)
[3] Convert muni polygons to a data frame for plotting (muni.df <- fortify(...))
[4] Join columns from muni@data to muni.df
[5] Join columns from temp.data to muni.df
[6] Make the plot

The joins must be done on common fields, and this is where most of the problems come in. Each polygon in the original shapefile has a unique ID attribute. Running fortify(...) on the shapefile creates a column, id, which is based on this. But there is no ID column in the data section. Instead, the polygon IDs are stored as row names. So first we must add an id column to muni@data as follows:

muni@data$id <- rownames(muni@data)

Now we have an id field in muni@data and a corresponding id field in muni.df, so we can do the join:

muni.df <- join(muni.df, muni@data, by="id")

To create the map we will need to set fill colors based on temperature level. To do that we need to join the LEVEL column from temp.data to muni.df. In temp.data there is a field CODINE which identifies the municipality. There is also, now, a corresponding field CODIGOINE in muni.df. But there's a problem: CODIGOINE is char(5), with leading zeros, whereas CODINE is integer which means leading zeros are missing (imported from Excel, perhaps?). So just joining on these two fields produces no matches. We must first convert CODINE into char(5) with leading zeros:

temp.data$CODINE <- str_pad(temp.data$CODINE, width = 5, side = 'left', pad = '0')

Now we can join temp.dat to muni.df based on the corresponding fields.

muni.df <- merge(muni.df, temp.data, by.x="CODIGOINE", by.y="CODINE", all.x=T, a..ly=F)

We use merge(...) instead of join(...) because the join fields have different names and join(...) requires them to have the same name. (Note, however that join(...) is faster and should be used if possible). So, finally, we have a data frame which contains all the information for plotting the polygons and the temperature LEVEL which can be used to establish the fill color for each polygon.

Some notes on OP's original code:

  1. OP's first map (the green one at the top) identifies "30 distinct areas for our region...". I could find no shapefile identifying those areas. The municipality file identifies 543 municipalities, and I could see no way to group these into 30 areas. In addition, the temperature level file has 542 rows, one for each municipality (more or less).

  2. OP was importing line shapefiles for municipality to draw the boundaries. You don't need that because geom_polygon(...) will draw (and fill) the polygons and geom_path(...) will draw the boundaries.

这篇关于R ggplot2 与 shapefile 和 csv 数据合并以填充多边形的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆