在R中优化Apply（） [英] Optimizing Apply() In R

查看：93 发布时间：2018/4/18 15:31:43 r performance optimization functional-programming

本文介绍了在R中优化Apply（）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

以下代码的目标是对具有400列和6000行的数据集执行递归和迭代分析。它在移动到所有可能的组合之前，一次需要两列并对其进行分析。

正在使用的大型数据集的小子集： $ b

  data1 data2 data3 data4 
 -0.710003 -0.714271 -0.709946  - 0.713645 
 -0.710458 -0.715011 -0.710117 -0.714157 
 -0.71071 -0.714048 -0.710235 -0.713515 
 -0.710255 -0.713991 -0.709722 -0.71397 
 -0.710585 -0.714491 -0.710223 -0.713885 
 -0.710414 -0.714092 -0.710166 -0.71434 
 -0.711255 -0.714116 -0.70945 -0.714173 
 -0.71097 -0.714059 -0.70928 -0.714059 
 -0.710343 -0.714576 -0.709338 -0.713644

使用 apply（） code $： $ $ $ $ p $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ b＃获取要比较的下一个数据 nextColumn<< ; - currentColumn + 1 while（nextColumn< = ncol（Data））{ ＃获取执行分析的两列 c1< - Data [，currentColumn] c2< - Data [，nextColumn] ＃创建线性模型 linearModel <-lm（c1〜c2）＃从摘要获取模型数据modelData< - summary（linearModel）＃残差 residualData <-t（t（modelData $ residuals））＃继续追加数据 linearData<< - cbind（linearData，residualData）＃获取下一列 nextColumn<< - nextColumn + 1 } ＃增加计数器 currentColumn<< - currentColumn + 1 } ＃应用于函数 apply（Data，2，function（x）analysisFunc（））

我认为不是使用循环，而是使用 apply（）我优化了代码。但是，它似乎没有重大影响。运行时间超过两个小时。

有人认为，我错了 apply（）的含义已被使用？在 apply（）中调用不是一个好主意，而<（）有任何其他方式我可以改善这个代码？

这是我第一次使用函数式编程。请让我知道您的建议，谢谢。考虑一个 expand.grid mapply 应用系列的多输入版本，您可以在其中传递两个+向量/列表并在每个输入上运行函数按元素。使用这种方法，您可以避免在循环和运行内部时扩展向量，而循环：

Data

pre $ Data < 0.709946 -0.713645 -0.710458 -0.715011 -0.710117 -0.714157 -0.71071 -0.714048 -0.710235 -0.713515 -0.710255 -0.713991 -0.709722 -0.71397 -0.710585 -0.714491 -0.710223 - 0.713885 -0.710414 -0.714092 -0.710166 -0.71434 -0.711255 -0.714116 -0.70945 -0.714173 -0.71097 -0.714059 -0.70928 -0.714059 -0.710343 -0.714576 -0.709338 -0.713644 ，header = TRUE）
流程

＃除相同列以外的所有组合的数据框 modelcols< - subset（expand.grid（c1 = names（Data），c2 = names数据）， stringsAsFactors = FALSE），c1！= c2）＃Function analysisFunc< - function（x，y）{ ＃获取要执行分析的两列 c1< - Data [[x]] c2 < - Data [[y]] ＃创建线性模型 linearModel <-lm（c1〜c2）＃捕获模型来自摘要 modelData< - 概要（linearModel）＃残差 residualData< - modelData $残差 } ＃应用函数返回残差矩阵 linearData< - mapply（analysisFunc，modelcols $ c1，modelcols $ c2）＃重命名矩阵列 colnames（linearData）< - paste（ modelcols $ c1，modelcols $ c2，sep =_）
输出
data2_data1 data3_data1 data4_data1 data1_data2 data3_data2 data4_data2 1.440828e-04 8.629813e-05 1.514109e-04 5.583917e -04 -0.0001205821 2.866488e-04 2 -6.949384e-04 -2.508770e-04 -2.487813e-04 -1.005367e-04 -0.0001263202 -2.145225e-04 3 2.132192e-04 -4.609125e-04 4.551430e-04 -8.715424e-05 -0.0004593840 4.133856e-04 4 3.692403e-04 2.182627e-04 -1.116648e-04 3.835538e-04 0.0000408864 -4.244855e-05 5 -2.025772e-04 -4.032600e-04 5.442655e-05 -8.423568e-05 -0.0003484501 4.986815e-05 6 2.336373e-04 -2.838073e-04 -4.425935e-04 1.967203e-04 -0.0003805576 -4.109706e-04 7 2.661145e-05 1.250425e-04 -6.893342e-05 -6.508936e -04 0.0003408023 -2.436194e-04 8 1.456357e-04 3.991303e-04 -2.496687e-05 -3.501856e-04 0.0004980726 -1.304535e-04 9 -2.349110e-04 5.701233e- 04 2.359596e-04 1.343401e-04 0.0005555326 2.921120e-04 data1_data3 data2_data3 data4_data3 data1_data4 data2_data4 data3_data4 1 5.121547e-04 4.313395e-05 2.829814e-04 4.232081e-04 1.795365e-04 - 9.584175e-05 2 -1.649379e-06 -6.684696e-04 -2.349827e-04 1.975728e-04 -7.112598e-04 -3.014160e-04 3 -2.942277e-04 3.141257e -04 4.029018e-04 -3.420290e-04 2.382149e-04 -3.760631e-04 4 3.371847e-04 2.859362e-04 -3.420612e-05 3.168009e-04 3.048006e-04 1.062117e-04 5 - 1.651011e-04 -1.308671e-04 3.332034e-05 -5.127719e-05 -1.969902e-04 -3.890484e-04 6 2.550032e-05 2.586674e-04 -4.196917e-04 3.235528e-04 2.115955e-04 -3.627735e-04 7 -5.692790e-04 1.157675e-04 -2.277195e-04 -5.922595e-04 1.840773e-04 3.645036e-04 8 -2.258187e- 04 1.445371e-04 -1.077903e-04 -3.583290e-04 2.386756e-04 5.422018e-04 9 3.812360e-04 -3.628313e-04 3.051868e-04 8.276013e-05 -2.870674e-04 5.122258e-04

The goal of the below code is to perform recursive and iterative analysis on a data set that has 400 columns and 6000 rows. It takes, two columns at a time and performs analysis on it, before moving to all the possible combinations.

Small sub set of large data set being used:
data1 data2 data3 data4 -0.710003 -0.714271 -0.709946 -0.713645 -0.710458 -0.715011 -0.710117 -0.714157 -0.71071 -0.714048 -0.710235 -0.713515 -0.710255 -0.713991 -0.709722 -0.71397 -0.710585 -0.714491 -0.710223 -0.713885 -0.710414 -0.714092 -0.710166 -0.71434 -0.711255 -0.714116 -0.70945 -0.714173 -0.71097 -0.714059 -0.70928 -0.714059 -0.710343 -0.714576 -0.709338 -0.713644
Code using apply():
# Function analysisFunc <- function () { # Fetch next data to be compared nextColumn <<- currentColumn + 1 while (nextColumn <= ncol(Data)){ # Fetch the two columns on which to perform analysis c1 <- Data[, currentColumn] c2 <- Data[, nextColumn] # Create linear model linearModel <- lm(c1 ~ c2) # Capture model data from summary modelData <- summary(linearModel) # Residuals residualData <- t(t(modelData$residuals)) # Keep on appending data linearData <<- cbind(linearData, residualData) # Fetch next column nextColumn <<- nextColumn + 1 } # Increment the counter currentColumn <<- currentColumn + 1 } # Apply on function apply(Data, 2, function(x) analysisFunc ())
I thought instead of using loops, apply() will help me optimize the code. However, it seems to have no major effect. Run time is more than two hours.

Does anyone think, I am going wrong on how apply() has been used? Is having while() within apply() call not a good idea? Any other way I can improve this code?

This is first time I am working with functional programming. Please let me know your suggestion, thanks.
解决方案
Consider an expand.grid of column names and then using mapply the multiple input version of apply family where you pass two+ vectors/lists and run a function across each input elementwise. With this approach you avoid expanding vectors within looping and running an inner while loop:

Data
Data <- read.table(text=" data1 data2 data3 data4 -0.710003 -0.714271 -0.709946 -0.713645 -0.710458 -0.715011 -0.710117 -0.714157 -0.71071 -0.714048 -0.710235 -0.713515 -0.710255 -0.713991 -0.709722 -0.71397 -0.710585 -0.714491 -0.710223 -0.713885 -0.710414 -0.714092 -0.710166 -0.71434 -0.711255 -0.714116 -0.70945 -0.714173 -0.71097 -0.714059 -0.70928 -0.714059 -0.710343 -0.714576 -0.709338 -0.713644", header=TRUE)
Process
# Data frame of all combinations excluding same columns modelcols <- subset(expand.grid(c1=names(Data), c2=names(Data), stringsAsFactors = FALSE), c1!=c2) # Function analysisFunc <- function(x,y) { # Fetch the two columns on which to perform analysis c1 <- Data[[x]] c2 <- Data[[y]] # Create linear model linearModel <- lm(c1 ~ c2) # Capture model data from summary modelData <- summary(linearModel) # Residuals residualData <- modelData$residuals } # Apply function to return matrix of residuals linearData <- mapply(analysisFunc, modelcols$c1, modelcols$c2) # re-naming matrix columns colnames(linearData) <- paste(modelcols$c1, modelcols$c2, sep="_")
Output
data2_data1 data3_data1 data4_data1 data1_data2 data3_data2 data4_data2 1 1.440828e-04 8.629813e-05 1.514109e-04 5.583917e-04 -0.0001205821 2.866488e-04 2 -6.949384e-04 -2.508770e-04 -2.487813e-04 -1.005367e-04 -0.0001263202 -2.145225e-04 3 2.132192e-04 -4.609125e-04 4.551430e-04 -8.715424e-05 -0.0004593840 4.133856e-04 4 3.692403e-04 2.182627e-04 -1.116648e-04 3.835538e-04 0.0000408864 -4.244855e-05 5 -2.025772e-04 -4.032600e-04 5.442655e-05 -8.423568e-05 -0.0003484501 4.986815e-05 6 2.336373e-04 -2.838073e-04 -4.425935e-04 1.967203e-04 -0.0003805576 -4.109706e-04 7 2.661145e-05 1.250425e-04 -6.893342e-05 -6.508936e-04 0.0003408023 -2.436194e-04 8 1.456357e-04 3.991303e-04 -2.496687e-05 -3.501856e-04 0.0004980726 -1.304535e-04 9 -2.349110e-04 5.701233e-04 2.359596e-04 1.343401e-04 0.0005555326 2.921120e-04 data1_data3 data2_data3 data4_data3 data1_data4 data2_data4 data3_data4 1 5.121547e-04 4.313395e-05 2.829814e-04 4.232081e-04 1.795365e-05 -9.584175e-05 2 -1.649379e-06 -6.684696e-04 -2.349827e-04 1.975728e-04 -7.112598e-04 -3.014160e-04 3 -2.942277e-04 3.141257e-04 4.029018e-04 -3.420290e-04 2.382149e-04 -3.760631e-04 4 3.371847e-04 2.859362e-04 -3.420612e-05 3.168009e-04 3.048006e-04 1.062117e-04 5 -1.651011e-04 -1.308671e-04 3.332034e-05 -5.127719e-05 -1.969902e-04 -3.890484e-04 6 2.550032e-05 2.586674e-04 -4.196917e-04 3.235528e-04 2.115955e-04 -3.627735e-04 7 -5.692790e-04 1.157675e-04 -2.277195e-04 -5.922595e-04 1.840773e-04 3.645036e-04 8 -2.258187e-04 1.445371e-04 -1.077903e-04 -3.583290e-04 2.386756e-04 5.422018e-04 9 3.812360e-04 -3.628313e-04 3.051868e-04 8.276013e-05 -2.870674e-04 5.122258e-04

这篇关于在R中优化Apply（）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在R中优化Apply（） [英] Optimizing Apply() In R

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在R中优化Apply（） [英] Optimizing Apply() In R

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭