新的列和数据表 [英] Rbind with new columns and data.table

查看:82
本文介绍了新的列和数据表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要向现有表中添加许多大表,因此我使用rbind和优秀的数据表data.table。但是一些后面的表具有比原始列更多的列(需要包括)。对于data.table有相当于rbind.fill吗?

I need to add many large tables to an existing table, so I use rbind with the excellent package data.table. But some of the later tables have more columns than the original one (which need to be included). Is there an equivalent of rbind.fill for data.table?

library(data.table)

aa <- c(1,2,3)
bb <- c(2,3,4)
cc <- c(3,4,5)

dt.1 <- data.table(cbind(aa, bb))
dt.2 <- data.table(cbind(aa, bb, cc))

dt.11 <- rbind(dt.1, dt.1)  # Works, but not what I need
dt.12 <- rbind(dt.1, dt.2)  # What I need, doesn't work
dt.12 <- rbind.fill(dt.1, dt.2)  # What I need, doesn't work either

我需要在我拥有所有表之前启动rbinding,所以没有办法知道将来的新列将被调用。

I need to start rbinding before I have all tables, so no way to know what future new columns will be called. Missing data can be filled with NA.

推荐答案

这里是一种更新

rbind.missing <- function(A, B) { 

  cols.A <- names(A)
  cols.B <- names(B)

  missing.A <- setdiff(cols.B,cols.A)
  # check and define missing columns in A
  if(length(missing.A) > 0L){
   class.missing.A <- lapply(B[,missing.A,with = FALSE], class)
   nas.A <- lapply(class.missing.A, as, object = NA)
   A[,c(missing.A) := nas.A]
  }
  # check and define missing columns in B
  missing.B <- setdiff(names(A), cols.B)
  if(length(missing.B) > 0L){
    class.missing.B <- lapply(A[,missing.B,with = FALSE], class)
    nas.B <- lapply(class.missing.B, as, object = NA)
    B[,c(missing.B) := nas.B]
  }
  # reorder so they are the same
  setcolorder(B, names(A))
  rbind(A, B)

}

rbind.missing(dt.1,dt.2)

##    aa bb cc
## 1:  1  2 NA
## 2:  2  3 NA
## 3:  3  4 NA
## 4:  1  2  3
## 5:  2  3  4
## 6:  3  4  5

这对许多或大数据.tables,因为它一次只能工作两个。

This will not be efficient for many, or large data.tables, as it only works two at a time.

这篇关于新的列和数据表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆