按行快速矢量化合并 data.frames 列表 [英] Fast vectorized merge of list of data.frames by row
问题描述
关于在 SO 上的列表中合并 data.frame 的大多数问题与我在这里试图表达的内容不太相关,但请随时证明我是错的.
Most of the questions about merging data.frame in lists on SO don't quite relate to what I'm trying to get across here, but feel free to prove me wrong.
我有一个 data.frames 列表.我想将行rbind"到另一个 data.frame 中.本质上,所有第一行形成一个 data.frame,第二行第二个 data.frame,依此类推.结果将是一个长度与我的原始 data.frame(s) 中的行数相同的列表.到目前为止,data.frames 在维度上是相同的.
I have a list of data.frames. I would like to "rbind" rows into another data.frame by row. In essence, all first rows form one data.frame, second rows second data.frame and so on. Result would be a list of the same length as the number of rows in my original data.frame(s). So far, the data.frames are identical in dimensions.
这里有一些数据可供使用.
Here's some data to play around with.
sample.list <- list(data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)),
data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)),
data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)),
data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)),
data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)),
data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)),
data.frame(x = sample(1:100, 10), y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)))
这是我想出的好 ol' for 循环.
Here's what I've come up with with the good ol' for loop.
#solution 1
my.list <- vector("list", nrow(sample.list[[1]]))
for (i in 1:nrow(sample.list[[1]])) {
for (j in 1:length(sample.list)) {
my.list[[i]] <- rbind(my.list[[i]], sample.list[[j]][i, ])
}
}
#solution 2 (so far my favorite)
sample.list2 <- do.call("rbind", sample.list)
my.list2 <- vector("list", nrow(sample.list[[1]]))
for (i in 1:nrow(sample.list[[1]])) {
my.list2[[i]] <- sample.list2[seq(from = i, to = nrow(sample.list2), by = nrow(sample.list[[1]])), ]
}
可以使用矢量化来改善这一点而不会造成太大的伤害吗?当然,正确答案将包含一段代码.回答是"不算数.
Can this be improved using vectorization without much brainhurt? Correct answer will contain a snippet of code, of course. "Yes" as an answer doesn't count.
编辑
#solution 3 (a variant of solution 2 above)
ind <- rep(1:nrow(sample.list[[1]]), times = length(sample.list))
my.list3 <- split(x = sample.list2, f = ind)
基准测试
我使我的列表更大,每个 data.frame 的行数更多.我对结果进行了如下基准测试:
I've made my list larger with more rows per data.frame. I've benchmarked the results which are as follows:
#solution 1
system.time(for (i in 1:nrow(sample.list[[1]])) {
for (j in 1:length(sample.list)) {
my.list[[i]] <- rbind(my.list[[i]], sample.list[[j]][i, ])
}
})
user system elapsed
80.989 0.004 81.210
# solution 2
system.time(for (i in 1:nrow(sample.list[[1]])) {
my.list2[[i]] <- sample.list2[seq(from = i, to = nrow(sample.list2), by = nrow(sample.list[[1]])), ]
})
user system elapsed
0.957 0.160 1.126
# solution 3
system.time(split(x = sample.list2, f = ind))
user system elapsed
1.104 0.204 1.332
# solution Gabor
system.time(lapply(1:nr, bind.ith.rows))
user system elapsed
0.484 0.000 0.485
# solution ncray
system.time(alply(do.call("cbind",sample.list), 1,
.fun=matrix, ncol=ncol(sample.list[[1]]), byrow=TRUE,
dimnames=list(1:length(sample.list),names(sample.list[[1]]))))
user system elapsed
11.296 0.016 11.365
推荐答案
试试这个:
bind.ith.rows <- function(i) do.call(rbind, lapply(sample.list, "[", i, TRUE))
nr <- nrow(sample.list[[1]])
lapply(1:nr, bind.ith.rows)
这篇关于按行快速矢量化合并 data.frames 列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!