do.call(rbind, list) 用于奇数列 [英] do.call(rbind, list) for uneven number of column

查看:28
本文介绍了do.call(rbind, list) 用于奇数列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个列表,每个元素都是一个不同长度的字符向量我想将数据绑定为行,以便列名对齐",如果有额外的数据,则创建列,如果缺少数据,则创建 NA

I have a list, with each element being a character vector, of differing lengths I would like to bind the data as rows, so that the column names 'line up' and if there is extra data then create column and if there is missing data then create NAs

以下是我正在处理的数据的模拟示例

Below is a mock example of the data I am working with

x <- list()
x[[1]] <- letters[seq(2,20,by=2)]
names(x[[1]]) <- LETTERS[c(1:length(x[[1]]))]
x[[2]] <- letters[seq(3,20, by=3)]
names(x[[2]]) <- LETTERS[seq(3,20, by=3)]
x[[3]] <- letters[seq(4,20, by=4)]
names(x[[3]]) <- LETTERS[seq(4,20, by=4)]

如果我确定每个元素的格式相同,我通常会使用下面的行...

The below line would normally be what I would do if I was sure that the format for each element was the same...

do.call(rbind,x)

我希望有人提出了一个很好的小解决方案,可以匹配列名并用 NA 填充空格,同时如果在绑定过程中找到新列,则添加新列...

I was hoping that someone had come up with a nice little solution that matches up the column names and fills in blanks with NAs whilst adding new columns if in the binding process new columns are found...

推荐答案

rbind.fill 是一个很棒的函数,它在 data.frames 列表中表现得非常好.但是恕我直言,对于这种情况,当列表仅包含(命名)向量时,它可以做得更快.

rbind.fill is an awesome function that does really well on list of data.frames. But IMHO, for this case, it could be done much faster when the list contains only (named) vectors.

require(plyr)
rbind.fill(lapply(x,function(y){as.data.frame(t(y),stringsAsFactors=FALSE)}))

一种更直接的方式(至少在这种情况下是有效的):

rbind.named.fill <- function(x) {
    nam <- sapply(x, names)
    unam <- unique(unlist(nam))
    len <- sapply(x, length)
    out <- vector("list", length(len))
    for (i in seq_along(len)) {
        out[[i]] <- unname(x[[i]])[match(unam, nam[[i]])]
    }
    setNames(as.data.frame(do.call(rbind, out), stringsAsFactors=FALSE), unam)
}

基本上,我们获得了全部唯一名称,以形成最终 data.frame 的列.然后,我们创建一个长度为 input 的列表,并用 NA 填充其余的值.这可能是最棘手"的部分,因为我们必须在填充 NA 时匹配名称.然后,我们最后一次为列设置名称(如果需要,也可以使用 data.table 包中的 setnames 通过引用设置).

Basically, we get total unique names to form the columns of the final data.frame. Then, we create a list with length = input and just fill the rest of the values with NA. This is probably the "trickiest" part as we've to match the names while filling NA. And then, we set names once finally to the columns (which can be set by reference using setnames from data.table package as well if need be).

现在进行一些基准测试:

Now to some benchmarking:

# generate some huge random data:
set.seed(45)
sample.fun <- function() {
    nam <- sample(LETTERS, sample(5:15))
    val <- sample(letters, length(nam))
    setNames(val, nam)  
}
ll <- replicate(1e4, sample.fun())

功能:

# plyr's rbind.fill version:
rbind.fill.plyr <- function(x) {
    rbind.fill(lapply(x,function(y){as.data.frame(t(y),stringsAsFactors=FALSE)}))
}

rbind.named.fill <- function(x) {
    nam <- sapply(x, names)
    unam <- unique(unlist(nam))
    len <- sapply(x, length)
    out <- vector("list", length(len))
    for (i in seq_along(len)) {
        out[[i]] <- unname(x[[i]])[match(unam, nam[[i]])]
    }
    setNames(as.data.frame(do.call(rbind, out), stringsAsFactors=FALSE), unam)
}

更新(也添加了 GSee 的功能):

foo <- function (...) 
{
  dargs <- list(...)
  all.names <- unique(names(unlist(dargs)))
  out <- do.call(rbind, lapply(dargs, `[`, all.names))
  colnames(out) <- all.names
  as.data.frame(out, stringsAsFactors=FALSE)
}

基准测试:

require(microbenchmark)
microbenchmark(t1 <- rbind.named.fill(ll), 
               t2 <- rbind.fill.plyr(ll), 
               t3 <- do.call(foo, ll), times=10)
identical(t1, t2) # TRUE
identical(t1, t3) # TRUE

Unit: milliseconds
                       expr        min         lq     median         uq        max neval
 t1 <- rbind.named.fill(ll)   243.0754   258.4653   307.2575   359.4332   385.6287    10
  t2 <- rbind.fill.plyr(ll) 16808.3334 17139.3068 17648.1882 17890.9384 18220.2534    10
     t3 <- do.call(foo, ll)   188.5139   204.2514   229.0074   339.6309   359.4995    10

这篇关于do.call(rbind, list) 用于奇数列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆