快速向量化合并数据列表 [英] Fast vectorized merge of list of data.frames by row

查看:85
本文介绍了快速向量化合并数据列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于在SO上列出的合并数据框架的大多数问题与我正在尝试的东西并不完全相关,但可以自由地证明我的错误。



我有一个data.frames列表。我想将行rbind到另一个数据。实质上,所有的第一行形成一个data.frame,第二行第二个data.frame等。
结果将是与原始数据框架中的行数相同的列表。到目前为止,data.frames的尺寸是相同的。



这里有一些数据可以玩。

  sample.list<  -  list(data.frame(x = sample(1:100,10))y = sample(1:100,10),capt = sample(0:1 ,10,replace = TRUE)),
data.frame(x = sample(1:100,10))y = sample(1:100,10),capt = sample(0:1,10,replace = TRUE)),
data.frame(x = sample(1:100,10),y = sample(1:100,10),capt = sample(0:1,10,replace = TRUE) ,
data.frame(x = sample(1:100,10),y = sample(1:100,10),capt = sample(0:1,10,replace = TRUE)),
data.frame(x = sample(1:100,10),y = sample(1:100,10),capt = sample(0:1,10,replace = TRUE)),
data.frame (x = sample(1:100,10),y = sample(1:100,10),capt = sample(0:1,10,replace = TRUE)),
data.frame (1:100,10),y = sample(1:100,10),capt = sample(0:1,10,replace = TRUE))

这是我已经提出的

  #solution 1 
my.list< - vector(list, nrow(sample.list [[1]]))
for(i in 1:nrow(sample.list [[1]])){
for(j in 1:length(sample.list )){
my.list [[i]]< - rbind(my.list [[i]],sample.list [[j]] [i,])
}
}

#solution 2(到目前为止,我最喜欢的)
sample.list2< - do.call(rbind,sample.list)
my.list2< - vector(list,nrow(sample.list [[1]]))

for(i in 1:nrow(sample.list [[1]])){
my.list2 [[i]]< - sample.list2 [seq(from = i,to = nrow(sample.list2),by = nrow(sample.list [[1]]))]]
}

可以通过矢量化来改善这一点吗?当然,正确答案将包含一段代码。 是,因为答案不算。



编辑

  #solution 3(上述解决方案2的变体)
ind < - rep(1:nrow(sample.list [[1]]),times = length(sample.list ))
my.list3< - split(x = sample.list2,f = ind)



我已经将我的列表更大,每个数据框架更多的行。我对以下结果进行了基准测试:

  #solution 1 
system.time(for(i in 1:nrow(sample.list [[1]])){
for(j in 1:length(sample.list)){
my.list [[i]]< - rbind my.list [[i]],sample.list [[j]] [i,])
}
})
用户系统已用
80.989 0.004 81.210

#solution 2
system.time(for(i in 1:nrow(sample.list [[1]])){
my.list2 [[i]]< sample.list2 [seq(from = i,to = nrow(sample.list2),by = nrow(sample.list [[1]]))]]
})
用户系统已用
0.957 0.160 1.126

#solution 3
system.time(split(x = sample.list2,f = ind))
用户系统已用
1.104 0.204 1.332

#解决方案Gabor
system.time(lapply(1:nr,bind.ith.rows))
用户系统已用
0.484 0.000 0.485

#solution ncray
system.time(alply(do.call(cbind,sample.list),1,
.fun = matrix,ncol = nco l(sample.list [[1]]),byrow = TRUE,
dimnames = list(1:length(sample.list),names(sample.list [[1]])))
用户系统已用
11.296 0.016 11.365


解决方案

尝试这样:

  bind.ith.rows<  -  function(i)do.call(rbind,lapply(sample.list ,[,i,TRUE))
nr< - nrow(sample.list [[1]])
lapply(1:nr,bind.ith.rows)


Most of the questions about merging data.frame in lists on SO don't quite relate to what I'm trying to get across here, but feel free to prove me wrong.

I have a list of data.frames. I would like to "rbind" rows into another data.frame by row. In essence, all first rows form one data.frame, second rows second data.frame and so on. Result would be a list of the same length as the number of rows in my original data.frame(s). So far, the data.frames are identical in dimensions.

Here's some data to play around with.

sample.list <- list(data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)),
        data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)),
        data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)),
        data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)),
        data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)),
        data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)),
        data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)))

Here's what I've come up with with the good ol' for loop.

#solution 1
my.list <- vector("list", nrow(sample.list[[1]]))
for (i in 1:nrow(sample.list[[1]])) {
    for (j in 1:length(sample.list)) {
        my.list[[i]] <- rbind(my.list[[i]], sample.list[[j]][i, ])
    }
}

#solution 2 (so far my favorite)
sample.list2 <- do.call("rbind", sample.list)
my.list2 <- vector("list", nrow(sample.list[[1]]))

for (i in 1:nrow(sample.list[[1]])) {
    my.list2[[i]] <- sample.list2[seq(from = i, to = nrow(sample.list2), by = nrow(sample.list[[1]])), ]
}

Can this be improved using vectorization without much brainhurt? Correct answer will contain a snippet of code, of course. "Yes" as an answer doesn't count.

EDIT

#solution 3 (a variant of solution 2 above)
ind <- rep(1:nrow(sample.list[[1]]), times = length(sample.list))
my.list3 <- split(x = sample.list2, f = ind)

BENCHMARKING

I've made my list larger with more rows per data.frame. I've benchmarked the results which are as follows:

#solution 1
system.time(for (i in 1:nrow(sample.list[[1]])) {
    for (j in 1:length(sample.list)) {
        my.list[[i]] <- rbind(my.list[[i]], sample.list[[j]][i, ])
    }
})
   user  system elapsed 
 80.989   0.004  81.210 

# solution 2
system.time(for (i in 1:nrow(sample.list[[1]])) {
    my.list2[[i]] <- sample.list2[seq(from = i, to = nrow(sample.list2), by = nrow(sample.list[[1]])), ]
})
   user  system elapsed 
  0.957   0.160   1.126 

# solution 3
system.time(split(x = sample.list2, f = ind))
   user  system elapsed 
  1.104   0.204   1.332 

# solution Gabor
system.time(lapply(1:nr, bind.ith.rows))
   user  system elapsed 
  0.484   0.000   0.485 

# solution ncray
system.time(alply(do.call("cbind",sample.list), 1,
                .fun=matrix, ncol=ncol(sample.list[[1]]), byrow=TRUE,
                dimnames=list(1:length(sample.list),names(sample.list[[1]]))))
   user  system elapsed 
 11.296   0.016  11.365

解决方案

Try this:

bind.ith.rows <- function(i) do.call(rbind, lapply(sample.list, "[", i, TRUE))
nr <- nrow(sample.list[[1]])
lapply(1:nr, bind.ith.rows)

这篇关于快速向量化合并数据列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆