使rbindlist跳过,忽略或更改列的类属性 [英] Make rbindlist skip, ignore or change class attribute of the column

查看:146
本文介绍了使rbindlist跳过,忽略或更改列的类属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想合并一大组数据框(大约30个),每个数据框都有大约200个变量.这些数据集非常相似,但不完全相同.

I would like to merge a large set of dataframes (about 30), which each have about 200 variables. These datasets are very much alike but not identical.

请在下面找到两个示例数据框:

Please find two example dataframes below:

library(data.table)
library(haven)
df1 <- fread(
    "A   B   C  iso   year   
     0   B   1  NLD   2009   
     1   A   2  NLD   2009   
     0   Y   3  AUS   2011   
     1   Q   4  AUS   2011   
     0   NA  7  NLD   2008   
     1   0   1  NLD   2008   
     0   1   3  AUS   2012",
  header = TRUE
)
df2 <- fread(
    "A   B   D  E  iso   year   
     0   1   1  NA ECU   2009   
     1   0   2  0  ECU   2009   
     0   0   3  0  BRA   2011   
     1   0   4  0  BRA   2011   
     0   1   7  NA ECU   2008   
     1   0   1  0  ECU   2008   
     0   0   3  2  BRA   2012   
     1   0   4  NA BRA   2012",
  header = TRUE
)

要重新创建错误:

class(df2$B) <- "anything"

当我执行以下操作

df_merged <- rbindlist(list(df1, df2), fill=TRUE, use.names=TRUE)

数据集显示错误:

Error in rbindlist(list(df1, df2), fill = TRUE, use.names = TRUE) : 
  Class attribute on column 2 of item 2 does not match with column 2 of item 1.

我该怎么办:

  1. 使rbindlist跳过不匹配的列并添加一些后缀.
  2. 将其中一列的类别更改为另一列.
  1. Make rbindlist skip the column which does not match and add some suffix.
  2. Change the class of one of the columns to the other one.

选项1的所需结果

df_merged <- fread(
    "A   B  B.x  C  D   E   iso   year   
     0   A   NA  1  NA  NA  NLD   2009   
     1   Y   NA  2  NA  NA  NLD   2009   
     0   Q   NA  3  NA  NA  AUS   2011   
     1   NA  NA  4  NA  NA  AUS   2011   
     0   0   NA  7  NA  NA  NLD   2008   
     1   1   NA  1  NA  NA  NLD   2008   
     0   1   NA  3  NA  NA  AUS   2012   
     0   NA  1   NA  1  NA  ECU   2009   
     1   NA  0   NA  2  0   ECU   2009   
     0   NA  0   NA  3  0   BRA   2011   
     1   NA  0   NA  4  0   BRA   2011   
     0   NA  1   NA  7  NA  ECU   2008   
     1   NA  0   NA  1  0   ECU   2008   
     0   NA  0   NA  3  2   BRA   2012   
     1   NA  0   NA  4  NA  BRA   2012",
   header = TRUE
)

选项2的所需结果

df_merged <- fread(
    "A   B   C  D   E   iso   year   
     0   3   1  NA  NA  NLD   2009   
     1   4   2  NA  NA  NLD   2009   
     0   5   3  NA  NA  AUS   2011   
     1   5   4  NA  NA  AUS   2011   
     0   0   7  NA  NA  NLD   2008   
     1   1   1  NA  NA  NLD   2008   
     0   1   3  NA  NA  AUS   2012   
     0   1   NA  1  NA  ECU   2009   
     1   0   NA  2  0   ECU   2009   
     0   0   NA  3  0   BRA   2011   
     1   0   NA  4  0   BRA   2011   
     0   1   NA  7  NA  ECU   2008   
     1   0   NA  1  0   ECU   2008   
     0   0   NA  3  2   BRA   2012   
     1   0   NA  4  NA  BRA   2012",",
   header = TRUE
)

推荐答案

我想出了一个解决该问题的 inlegant 解决方案.基本上,我正在做的是将列表的第一项的列的属性分配给具有与列表中所有其他项相同名称的列.请记住,此解决方案是有问题的,根据项目的不同,这可能是非常错误的做法,因为它可能会破坏您的数据.但是,如果您需要使用rbindlist组合数据帧,那么就可以达到目的

I came up with this inelegant solution that bypasses the problem. Basically, What I am doing is to assign the attributes of the columns of the first item of the list to the columns with the same names of all the other items of the list. Keep in mind that this solution is problematic and, depending on the project, it could be a very wrong practice as it has the potential to mess up your data. However, if what you need is to use rbindlist to combine your dataframes, this makes the trick


dfs <- list(df1, df2)
varnames <- names(dfs[[1]]) # variable names
vattr <- purrr::map_chr(varnames, ~class(dfs[[1]][[.x]])) # variable attributes

for (i in seq_along(dfs)) {
  # assign the same attributes of list 1 to the rest of the lists
  for (j in seq_along(varnames)) {
    if (varnames[[j]]  %in% names(dfs[[i]])) {
      class(dfs[[i]][[varnames[[j]]]]) <- vattr[[j]]
    } 
  }
}


df_merged <- data.table::rbindlist(dfs, fill=TRUE, use.names=TRUE)

最佳,

这篇关于使rbindlist跳过,忽略或更改列的类属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆