R 将嵌套的 for 循环转换为 lapply() 以获得更好的性能 [英] R convert nested for loop to lapply() for better performance

查看:55
本文介绍了R 将嵌套的 for 循环转换为 lapply() 以获得更好的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于速度原因,我在将嵌套的 for 循环转换为 lapply() 时遇到了困难.

I am having difficulties converting my nested for loop to lapply() for speed reasons.

我有 2 个 data.table 我正在遍历每一行以比较它的内容,如果相等,则进行一些计算.我花了 10 多分钟来计算我的大约 1000 行和 360 行的数据集.

I have 2 data.tables that I am looping over every single row in order to compare it's contents and if equal, do some calculations. It's taking me more than 10 min to do the calculations for my dataset of about 1000 rows and 360 rows.

在这个最小的例子中,它不到一秒,但每行只有 3 行:

In this minimal example, it's less than a second, but it's only 3 rows each:

library(data.table)
library(tictoc)

name <- c(rep("apple",2), rep("banana",2), rep("citrus", 2))
stim <- c("nc","alk" ,"nc",  "lem", "haz", "nc")
vis <- c(1, 1, 1, 1, 6, 7)
f <-c(2,2,2,1,3,3)
g <-c(2,2,2,2,4,4)
h <- c(rep(2,6))
value<- c(5,10,5,10,10,5)
  
tab <- data.table(name, stim, vis, f,g,h,value)

tab1 <- tab[stim == "nc"]
tab2 <- tab[!(stim == "nc")]


tic("looping")

for(i in 1:NROW(tab1)){
  for (n in 1: NROW((tab2))){
    if(identical(tab2[n,name],tab1[i,name])
       
    & identical(tab2[n,vis],tab1[i,vis])
      & identical(tab2[n,3:(length(tab2)-1), with = FALSE],tab1[i,3:(length(tab1)-1), with = FALSE])){
       
      tab2[n,"value"] <- tab2[n, "value"] - tab1[i,"value"]
    }  
  }
  
}
toc()

我一直在研究 apply 系列,这似乎是一种方法,但我不知道如何解决它.感谢您的帮助!

I've been looking at the apply family and it seems to be one way to go but I cannot figure out how to solve it. I appreciate any help!

在循环之前,tab1 看起来像这样:

Before looping, tab1 looks like this:

     name stim vis f g h value
1:  apple   nc   1 2 2 2     5
2: banana   nc   1 2 2 2     5
3: citrus   nc   7 3 4 2     5

tab2 看起来像这样:

     name stim vis f g h value
1:  apple  alk   1 2 2 2    10
2: banana  lem   1 1 2 2    10
3: citrus  haz   6 3 4 2    10

循环后(只对tab2感兴趣),预期结果:

After looping (only interested in tab2), expected result:

     name stim vis f g h value
1:  apple  alk   1 2 2 2     5
2: banana  lem   1 1 2 2    10
3: citrus  haz   6 3 4 2    10

推荐答案

应用循环不会加快计算速度.事实上,它WILL会使它变慢,因为您已经定义了 data.frames 并且您只是在替换值.

A apply loop will not speed up your computation. In fact it WILL make it slower, since you already have your data.frames defined and you are just replacing values.

相反,我建议使用合并的替代方法.(注意:你的代码有一些错误并且没有运行,所以我希望我能正确理解你的意图.如果没有,让我知道).

Instead, I suggest an alternate approach using merge. (Note: your code had some errors and did not run, so I hope I am interpreting your intentions correctly. If not, let me know).

> merge(tab1, tab2, by = c("name", "vis", "f", "g", "h"), suffixes=c("1", "2"), all.y=T) -> tab3
> tab3$value <- tab3$value2-tab3$value1
> tab3
    name vis f g h stim1 value1 stim2 value2 value
1  apple   1 2 2 2    nc      5   alk     10     5
2 banana   1 1 2 2  <NA>     NA   lem     10    NA
3 citrus   6 3 4 2  <NA>     NA   haz     10    NA

从那里您可以根据需要重命名或移动列.

From there you can rename or move your columns as you like.

这篇关于R 将嵌套的 for 循环转换为 lapply() 以获得更好的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆