合并多个dbf文件与R中不匹配的标题 [英] merge multiple dbf files with non-matching headers in R
问题描述
我有超过800个dbf文件,我需要在R中导入和合并。我已经能够使用此代码引入所有文件:
<$ p $ (c:/ temp / help /)
文件< - list.files(pattern =\\\)
library(foreign)
setwd \\.dbf $)
all.the.data< - lapply(files,read.dbf,as.is = FALSE)
DATA< - do.call(rbind,全部。 the.data)
但是,这些dbf文件具有不同的列数,即使它们有时相同数量的列,这些标题可能会有所不同。这里有四个dbf文件提供了一个例子:
file01< - structure(list(PLOTBUFFER = structure(1L, .Label =1002_2km,class =factor),
VALUE_11 = 11443500,VALUE_31 = 13500,VALUE_42 = 928800,
VALUE_43 = 162000,VALUE_90 = 18900),.Names = c(PLOTBUFFER ,
VALUE_11,VALUE_31,VALUE_42,VALUE_43,VALUE_90),row.names = c(NA,
-1L),class =data.frame ,data_types = c(C,F,F,F,
F,F))
file02< - 结构(列表(PLOTBUFFER =结构(1L,.Label =1002_5km,class =factor),
VALUE_11 = 66254400,VALUE_21 = 125100,VALUE_31 = 80100,
VALUE_41 = 4234500,VALUE_42 = 3199500,VALUE_43 = 4194000,
VALUE_52 = 376200,VALUE_90 = 72000),.Names = c(PLOTBUFFER,
VALUE_11,VALUE_21,VALUE_31,VALUE_41,VALUE_42,VALUE_43,
VALUE_52,VALUE_90),row.names = c(NA,-1L),class =data.frame,data_types = c(C,
F,F ,F,F,F,F,F,F))
file03< - 结构(列表(PLOTBUFFER =结构(1L,。标签=1003_2km ,class =factor),
VALUE_11 = 1972800, VALUE_31 = 125100,VALUE_41 = 5316300,
VALUE_42 = 990900,VALUE_43 = 1995300,VALUE_52 = 740700,
VALUE_90 = 1396800,VALUE_95 = 25200),.Names = c(PLOTBUFFER,
VALUE_11,VALUE_31,VALUE_41,VALUE_42,VALUE_43,VALUE_52,
VALUE_90,VALUE_95),row.names = c(NA,-1L) =data.frame,data_types = c(C,
F,F,F,F,F,F,F,F ))
file04 < - structure(list(PLOTBUFFER = structure(1L,.Label =1003_5km,class =factor),
VALUE_11 = 43950600,VALUE_31 = 270000,VALUE_41 = 12969900,
VALUE_42 = 5105700,V ALUE_43 = 12614400,VALUE_52 = 1491300,
VALUE_90 = 2055600,VALUE_95 = 70200),.Names = c(PLOTBUFFER,
VALUE_11,VALUE_31,VALUE_41,VALUE_42, VALUE_43,VALUE_52,
VALUE_90,VALUE_95),row.names = c(NA,-1L),class =data.frame,data_types = c(C,
FFFFFFFF))
我希望数据框与此匹配:
merge< lt ; - 结构(列表(PLOTBUFFER = structure(1:2,.Label = c(1002_2km,
1002_5km),class =factor),VALUE_11 = c(11443500,66254400
),VALUE_21 = c(0,125100),VALUE_31 = c(13500,80100),VALUE_41 = c(0,376200),VALUE_90 = c(18900,72000)c(0,
4234500),VALUE_42 = c(928800,3199500),VALUE_43 = c(162000,
4194000) ),.Names = c(PLOTBUFFER,
VALUE_11,VALUE_21,VALUE_31,VALUE_41,VALUE_42,VALUE_43,
VALUE_52,VALUE_90 ),class =data.frame,row.names = c(NA,
-2L))
如果某个数据集中缺少一列,则只需使用零或NULL填充。
谢谢
-al
@infominer的建议适用于我作为示例包含的4个文件,但是当我尝试在大列表上使用merge_recurse
文件< - list.files(pattern =\\.dbf
all.the.data< - lapply(files,read.dbf,as.is = FALSE)
merged< - merge_recurse(all.the.data)
错误:评估嵌套过深:无限递归/选项(表达式=)?
在包装过程中出现错误:评估嵌套得太深:无限递归/选项(表达式=)?
重塑
库(重塑)
merged.files< -merge_recurse(list(file01,file02,file03,file04 ))
编辑:
试试这个代码感谢 Ramnath
Reduce(函数(...)合并(...,all = T),all.the.data)
改编自 https://stackoverflow.com/a/6947326/2747709
I have over 800 dbf files which I need to import and merge in R. I have been able to bring in all of the files using this code:
library(foreign)
setwd("c:/temp/help/")
files <- list.files(pattern="\\.dbf$")
all.the.data <- lapply(files, read.dbf, as.is=FALSE)
DATA <- do.call("rbind",all.the.data)
However, these dbf files have different numbers of columns and even if they sometimes have the same number of columns, those headers may be different. Here are four of the dbf files to provide an example:
file01 <- structure(list(PLOTBUFFER = structure(1L, .Label = "1002_2km", class = "factor"),
VALUE_11 = 11443500, VALUE_31 = 13500, VALUE_42 = 928800,
VALUE_43 = 162000, VALUE_90 = 18900), .Names = c("PLOTBUFFER",
"VALUE_11", "VALUE_31", "VALUE_42", "VALUE_43", "VALUE_90"), row.names = c(NA,
-1L), class = "data.frame", data_types = c("C", "F", "F", "F",
"F", "F"))
file02 <- structure(list(PLOTBUFFER = structure(1L, .Label = "1002_5km", class = "factor"),
VALUE_11 = 66254400, VALUE_21 = 125100, VALUE_31 = 80100,
VALUE_41 = 4234500, VALUE_42 = 3199500, VALUE_43 = 4194000,
VALUE_52 = 376200, VALUE_90 = 72000), .Names = c("PLOTBUFFER",
"VALUE_11", "VALUE_21", "VALUE_31", "VALUE_41", "VALUE_42", "VALUE_43",
"VALUE_52", "VALUE_90"), row.names = c(NA, -1L), class = "data.frame", data_types = c("C",
"F", "F", "F", "F", "F", "F", "F", "F"))
file03 <- structure(list(PLOTBUFFER = structure(1L, .Label = "1003_2km", class = "factor"),
VALUE_11 = 1972800, VALUE_31 = 125100, VALUE_41 = 5316300,
VALUE_42 = 990900, VALUE_43 = 1995300, VALUE_52 = 740700,
VALUE_90 = 1396800, VALUE_95 = 25200), .Names = c("PLOTBUFFER",
"VALUE_11", "VALUE_31", "VALUE_41", "VALUE_42", "VALUE_43", "VALUE_52",
"VALUE_90", "VALUE_95"), row.names = c(NA, -1L), class = "data.frame", data_types = c("C",
"F", "F", "F", "F", "F", "F", "F", "F"))
file04 <- structure(list(PLOTBUFFER = structure(1L, .Label = "1003_5km", class = "factor"),
VALUE_11 = 43950600, VALUE_31 = 270000, VALUE_41 = 12969900,
VALUE_42 = 5105700, VALUE_43 = 12614400, VALUE_52 = 1491300,
VALUE_90 = 2055600, VALUE_95 = 70200), .Names = c("PLOTBUFFER",
"VALUE_11", "VALUE_31", "VALUE_41", "VALUE_42", "VALUE_43", "VALUE_52",
"VALUE_90", "VALUE_95"), row.names = c(NA, -1L), class = "data.frame", data_types = c("C",
"F", "F", "F", "F", "F", "F", "F", "F"))
I would like the dataframe to match this:
merged <- structure(list(PLOTBUFFER = structure(1:2, .Label = c("1002_2km",
"1002_5km"), class = "factor"), VALUE_11 = c(11443500, 66254400
), VALUE_21 = c(0, 125100), VALUE_31 = c(13500, 80100), VALUE_41 = c(0,
4234500), VALUE_42 = c(928800, 3199500), VALUE_43 = c(162000,
4194000), VALUE_52 = c(0, 376200), VALUE_90 = c(18900, 72000)), .Names = c("PLOTBUFFER",
"VALUE_11", "VALUE_21", "VALUE_31", "VALUE_41", "VALUE_42", "VALUE_43",
"VALUE_52", "VALUE_90"), class = "data.frame", row.names = c(NA,
-2L))
Where if there is a missing column from one dataset it simply is filled in with a zero or NULL.
Thanks
-al
The suggestion by @infominer worked for the 4 files I included as an example but when I tried to use merge_recurse on the large list of 802 elements, I received an error.
files <- list.files(pattern="\\.dbf$")
all.the.data <- lapply(files, read.dbf, as.is=FALSE)
merged <- merge_recurse(all.the.data)
Error: evaluation nested too deeply: infinite recursion / options(expressions=)? Error during wrapup: evaluation nested too deeply: infinite recursion / options(expressions=)?
Use the package reshape
library(reshape)
merged.files <-merge_recurse(list(file01,file02,file03,file04))
Edit:
Try this code thanks to Ramnath
Reduce(function(...) merge(..., all=T),all.the.data)
adapted from https://stackoverflow.com/a/6947326/2747709
这篇关于合并多个dbf文件与R中不匹配的标题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!