R 将嵌套的 for 循环转换为 lapply() 以获得更好的性能 [英] R convert nested for loop to lapply() for better performance
问题描述
由于速度原因,我在将嵌套的 for
循环转换为 lapply()
时遇到了困难.
I am having difficulties converting my nested for
loop to lapply()
for speed reasons.
我有 2 个 data.table
我正在遍历每一行以比较它的内容,如果相等,则进行一些计算.我花了 10 多分钟来计算我的大约 1000 行和 360 行的数据集.
I have 2 data.table
s that I am looping over every single row in order to compare it's contents and if equal, do some calculations. It's taking me more than 10 min to do the calculations for my dataset of about 1000 rows and 360 rows.
在这个最小的例子中,它不到一秒,但每行只有 3 行:
In this minimal example, it's less than a second, but it's only 3 rows each:
library(data.table)
library(tictoc)
name <- c(rep("apple",2), rep("banana",2), rep("citrus", 2))
stim <- c("nc","alk" ,"nc", "lem", "haz", "nc")
vis <- c(1, 1, 1, 1, 6, 7)
f <-c(2,2,2,1,3,3)
g <-c(2,2,2,2,4,4)
h <- c(rep(2,6))
value<- c(5,10,5,10,10,5)
tab <- data.table(name, stim, vis, f,g,h,value)
tab1 <- tab[stim == "nc"]
tab2 <- tab[!(stim == "nc")]
tic("looping")
for(i in 1:NROW(tab1)){
for (n in 1: NROW((tab2))){
if(identical(tab2[n,name],tab1[i,name])
& identical(tab2[n,vis],tab1[i,vis])
& identical(tab2[n,3:(length(tab2)-1), with = FALSE],tab1[i,3:(length(tab1)-1), with = FALSE])){
tab2[n,"value"] <- tab2[n, "value"] - tab1[i,"value"]
}
}
}
toc()
我一直在研究 apply
系列,这似乎是一种方法,但我不知道如何解决它.感谢您的帮助!
I've been looking at the apply
family and it seems to be one way to go but I cannot figure out how to solve it. I appreciate any help!
在循环之前,tab1
看起来像这样:
Before looping, tab1
looks like this:
name stim vis f g h value
1: apple nc 1 2 2 2 5
2: banana nc 1 2 2 2 5
3: citrus nc 7 3 4 2 5
tab2
看起来像这样:
name stim vis f g h value
1: apple alk 1 2 2 2 10
2: banana lem 1 1 2 2 10
3: citrus haz 6 3 4 2 10
循环后(只对tab2
感兴趣),预期结果:
After looping (only interested in tab2
), expected result:
name stim vis f g h value
1: apple alk 1 2 2 2 5
2: banana lem 1 1 2 2 10
3: citrus haz 6 3 4 2 10
推荐答案
应用循环不会加快计算速度.事实上,它WILL会使它变慢,因为您已经定义了 data.frames 并且您只是在替换值.
A apply loop will not speed up your computation. In fact it WILL make it slower, since you already have your data.frames defined and you are just replacing values.
相反,我建议使用合并的替代方法.(注意:你的代码有一些错误并且没有运行,所以我希望我能正确理解你的意图.如果没有,让我知道).
Instead, I suggest an alternate approach using merge. (Note: your code had some errors and did not run, so I hope I am interpreting your intentions correctly. If not, let me know).
> merge(tab1, tab2, by = c("name", "vis", "f", "g", "h"), suffixes=c("1", "2"), all.y=T) -> tab3
> tab3$value <- tab3$value2-tab3$value1
> tab3
name vis f g h stim1 value1 stim2 value2 value
1 apple 1 2 2 2 nc 5 alk 10 5
2 banana 1 1 2 2 <NA> NA lem 10 NA
3 citrus 6 3 4 2 <NA> NA haz 10 NA
从那里您可以根据需要重命名或移动列.
From there you can rename or move your columns as you like.
这篇关于R 将嵌套的 for 循环转换为 lapply() 以获得更好的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!