具有不同列数的 rbindlist data.tables [英] rbindlist data.tables with different number of columns

查看:18
本文介绍了具有不同列数的 rbindlist data.tables的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道如何 rbindlist 具有不同列数的数据表,并用 rbind.fill 之类的 NA 填充空行

I am wondering how do I rbindlist data tables with different number of columns, and filling up empty rows with NAs like rbind.fill

 DT1 <- data.table(A = 1:3)
 DT2 <- data.table(A  =4:5, B = letters[4:5])
 l <- list(DT1, DT2)
 rbindlist(l)
 #  Error in rbindlist(l) : 
 #   Item 2 has 2 columns, inconsistent with item 1 which has 1 columns

我想得到的是

   A B
1: 1 NA
2: 2 NA
3: 3 NA
4: 4 d
5: 5 e

推荐答案

这个功能现在在 commit 1266 of v1.9.3.来自 新闻:

o  'rbindlist' gains 'use.names' and 'fill' arguments and is now implemented 
   entirely in C. Closes #5249    
  -> use.names by default is FALSE for backwards compatibility (doesn't bind by 
     names by default)
  -> rbind(...) now just calls rbindlist() internally, except that 'use.names' 
     is TRUE by default, for compatibility with base (and backwards compatibility).
  -> fill by default is FALSE. If fill is TRUE, use.names has to be TRUE.
  -> At least one item of the input list has to have non-null column names.
  -> Duplicate columns are bound in the order of occurrence, like base.
  -> Attributes that might exist in individual items would be lost in the bound result.
  -> Columns are coerced to the highest SEXPTYPE, if they are different, if/when possible.
  -> And incredibly fast ;).
  -> Documentation updated in much detail. Closes DR #5158.

查看这篇文章了解基准测试.

1) 使用 rbindlistfill 参数:

1) Using fill argument of rbindlist:

DT1 <- data.table(x=1, y=2)
DT2 <- data.table(y=2, z=-1)

rbindlist(list(DT1, DT2), fill=TRUE)
#     x y  z
# 1:  1 2 NA
# 2: NA 2 -1

请注意,当 fill=TRUE 时,use.names 应为 TRUE.

Note that when fill=TRUE, use.names should be TRUE.

2) 适当地绑定具有重复名称的表:

2) Binding tables with duplicate names appropriately:

DT1 <- data.table(x=1, x=2, y=1, y=2)
DT2 <- data.table(y=3, y=-1, y=-2)

rbindlist(list(DT1, DT2), fill=TRUE)
#     x  x y  y  y
# 1:  1  2 1  2 NA
# 2: NA NA 3 -1 -2

<小时>

3) 它不仅限于 data.tables,还适用于 data.frameslists:

DT1 <- data.table(x=1, y=2)
DT2 <- data.frame(y=2, z=-1)
DT3 <- list(z=10)

rbindlist(list(DT1,DT2,DT3), fill=TRUE)

#     x  y  z
# 1:  1  2 NA
# 2: NA  2 -1
# 3: NA NA 10

<小时>

4) 如果您只想通过名称绑定,您可以只设置 use.names=TRUE,而不是 fill:

DT1 <- data.table(x=1, y=2)
DT2 <- data.table(y=1, x=2)

rbindlist(list(DT1,DT2), use.names=TRUE, fill=FALSE)
#    x y
# 1: 1 2
# 2: 2 1

DT1 <- data.table(x=1, y=2)
DT2 <- data.table(z=2, y=1)

# returns error when fill=FALSE but can't be bound without fill=TRUE
rbindlist(list(DT1, DT2), use.names=TRUE, fill=FALSE)
# Error in rbindlist(list(DT1, DT2), use.names = TRUE, fill = FALSE) : 
    # Answer requires 3 columns whereas one or more item(s) in the input 
    # list has only 2 columns. ...

<小时>

5) 向后兼容的默认值相同(use.names=FALSE, fill=FALSE):

DT1 <- data.table(x=1, y=2)
DT2 <- data.table(y=1, x=2)

rbindlist(list(DT1, DT2))

#    x y
# 1: 1 2
# 2: 1 2

HTH

这篇关于具有不同列数的 rbindlist data.tables的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆