使rbindlist跳过,忽略或更改列的类属性 [英] Make rbindlist skip, ignore or change class attribute of the column
问题描述
我想合并一大组数据框(大约30个),每个数据框都有大约200个变量.这些数据集非常相似,但不完全相同.
I would like to merge a large set of dataframes (about 30), which each have about 200 variables. These datasets are very much alike but not identical.
请在下面找到两个示例数据框:
Please find two example dataframes below:
library(data.table)
library(haven)
df1 <- fread(
"A B C iso year
0 B 1 NLD 2009
1 A 2 NLD 2009
0 Y 3 AUS 2011
1 Q 4 AUS 2011
0 NA 7 NLD 2008
1 0 1 NLD 2008
0 1 3 AUS 2012",
header = TRUE
)
df2 <- fread(
"A B D E iso year
0 1 1 NA ECU 2009
1 0 2 0 ECU 2009
0 0 3 0 BRA 2011
1 0 4 0 BRA 2011
0 1 7 NA ECU 2008
1 0 1 0 ECU 2008
0 0 3 2 BRA 2012
1 0 4 NA BRA 2012",
header = TRUE
)
要重新创建错误:
class(df2$B) <- "anything"
当我执行以下操作
df_merged <- rbindlist(list(df1, df2), fill=TRUE, use.names=TRUE)
数据集显示错误:
Error in rbindlist(list(df1, df2), fill = TRUE, use.names = TRUE) :
Class attribute on column 2 of item 2 does not match with column 2 of item 1.
我该怎么办:
- 使
rbindlist
跳过不匹配的列并添加一些后缀. - 将其中一列的类别更改为另一列.
- Make
rbindlist
skip the column which does not match and add some suffix. - Change the class of one of the columns to the other one.
选项1的所需结果
df_merged <- fread(
"A B B.x C D E iso year
0 A NA 1 NA NA NLD 2009
1 Y NA 2 NA NA NLD 2009
0 Q NA 3 NA NA AUS 2011
1 NA NA 4 NA NA AUS 2011
0 0 NA 7 NA NA NLD 2008
1 1 NA 1 NA NA NLD 2008
0 1 NA 3 NA NA AUS 2012
0 NA 1 NA 1 NA ECU 2009
1 NA 0 NA 2 0 ECU 2009
0 NA 0 NA 3 0 BRA 2011
1 NA 0 NA 4 0 BRA 2011
0 NA 1 NA 7 NA ECU 2008
1 NA 0 NA 1 0 ECU 2008
0 NA 0 NA 3 2 BRA 2012
1 NA 0 NA 4 NA BRA 2012",
header = TRUE
)
选项2的所需结果
df_merged <- fread(
"A B C D E iso year
0 3 1 NA NA NLD 2009
1 4 2 NA NA NLD 2009
0 5 3 NA NA AUS 2011
1 5 4 NA NA AUS 2011
0 0 7 NA NA NLD 2008
1 1 1 NA NA NLD 2008
0 1 3 NA NA AUS 2012
0 1 NA 1 NA ECU 2009
1 0 NA 2 0 ECU 2009
0 0 NA 3 0 BRA 2011
1 0 NA 4 0 BRA 2011
0 1 NA 7 NA ECU 2008
1 0 NA 1 0 ECU 2008
0 0 NA 3 2 BRA 2012
1 0 NA 4 NA BRA 2012",",
header = TRUE
)
推荐答案
我想出了一个解决该问题的 inlegant 解决方案.基本上,我正在做的是将列表的第一项的列的属性分配给具有与列表中所有其他项相同名称的列.请记住,此解决方案是有问题的,根据项目的不同,这可能是非常错误的做法,因为它可能会破坏您的数据.但是,如果您需要使用rbindlist
组合数据帧,那么就可以达到目的
I came up with this inelegant solution that bypasses the problem. Basically, What I am doing is to assign the attributes of the columns of the first item of the list to the columns with the same names of all the other items of the list. Keep in mind that this solution is problematic and, depending on the project, it could be a very wrong practice as it has the potential to mess up your data. However, if what you need is to use rbindlist
to combine your dataframes, this makes the trick
dfs <- list(df1, df2)
varnames <- names(dfs[[1]]) # variable names
vattr <- purrr::map_chr(varnames, ~class(dfs[[1]][[.x]])) # variable attributes
for (i in seq_along(dfs)) {
# assign the same attributes of list 1 to the rest of the lists
for (j in seq_along(varnames)) {
if (varnames[[j]] %in% names(dfs[[i]])) {
class(dfs[[i]][[varnames[[j]]]]) <- vattr[[j]]
}
}
}
df_merged <- data.table::rbindlist(dfs, fill=TRUE, use.names=TRUE)
最佳,
这篇关于使rbindlist跳过,忽略或更改列的类属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!