在R中优化Apply() [英] Optimizing Apply() In R
问题描述
以下代码的目标是对具有400列和6000行的数据集执行递归和迭代分析。它在移动到所有可能的组合之前,一次需要两列并对其进行分析。
正在使用的大型数据集的小子集: $ b
data1 data2 data3 data4
-0.710003 -0.714271 -0.709946 - 0.713645
-0.710458 -0.715011 -0.710117 -0.714157
-0.71071 -0.714048 -0.710235 -0.713515
-0.710255 -0.713991 -0.709722 -0.71397
-0.710585 -0.714491 -0.710223 -0.713885
-0.710414 -0.714092 -0.710166 -0.71434
-0.711255 -0.714116 -0.70945 -0.714173
-0.71097 -0.714059 -0.70928 -0.714059
-0.710343 -0.714576 -0.709338 -0.713644
使用 apply() code $:
$ $ $ $ p $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $
$ b#获取要比较的下一个数据
nextColumn<< ; - currentColumn + 1
while(nextColumn< = ncol(Data)){
#获取执行分析的两列
c1< - Data [,currentColumn]
c2< - Data [,nextColumn]
#创建线性模型
linearModel <-lm(c1〜c2)
#从摘要
获取模型数据modelData< - summary(linearModel)
#残差
residualData <-t(t(modelData $ residuals))
#继续追加数据
linearData<< - cbind(linearData,residualData)
#获取下一列
nextColumn<< - nextColumn + 1
}
#增加计数器
currentColumn<< - currentColumn + 1
}
#应用于函数
apply(Data,2,function(x)analysisFunc())
我认为不是使用循环,而是使用 apply()
我优化了代码。但是,它似乎没有重大影响。运行时间超过两个小时。
有人认为,我错了 apply()
的含义已被使用?在 apply()
中调用不是一个好主意,而<()有
这是我第一次使用函数式编程。请让我知道您的建议,谢谢。 考虑一个 Data 流程 输出 The goal of the below code is to perform recursive and iterative analysis on a data set that has 400 columns and 6000 rows. It takes, two columns at a time and performs analysis on it, before moving to all the possible combinations. Small sub set of large data set being used: Code using I thought instead of using loops, Does anyone think, I am going wrong on how This is first time I am working with functional programming. Please let me know your suggestion, thanks. Consider an Data Process Output
这篇关于在R中优化Apply()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋! expand.grid $ c $然后使用
mapply
应用系列的多输入版本,您可以在其中传递两个+向量/列表并在每个输入上运行函数按元素。使用这种方法,您可以避免在循环和运行内部时扩展向量,而
循环:
pre $ Data < 0.709946 -0.713645
-0.710458 -0.715011 -0.710117 -0.714157
-0.71071 -0.714048 -0.710235 -0.713515
-0.710255 -0.713991 -0.709722 -0.71397
-0.710585 -0.714491 -0.710223 - 0.713885
-0.710414 -0.714092 -0.710166 -0.71434
-0.711255 -0.714116 -0.70945 -0.714173
-0.71097 -0.714059 -0.70928 -0.714059
-0.710343 -0.714576 -0.709338 -0.713644 ,header = TRUE)
#除相同列以外的所有组合的数据框
modelcols< - subset(expand.grid(c1 = names(Data),c2 = names数据),
stringsAsFactors = FALSE),c1!= c2)
#Function
analysisFunc< - function(x,y){
#获取要执行分析的两列
c1< - Data [[x]]
c2 < - Data [[y]]
#创建线性模型
linearModel <-lm(c1〜c2)
#捕获模型来自摘要
modelData< - 概要(linearModel)
#残差
residualData< - modelData $残差
}
#应用函数返回残差矩阵
linearData< - mapply(analysisFunc,modelcols $ c1,modelcols $ c2)
#重命名矩阵列
colnames(linearData)< - paste( modelcols $ c1,modelcols $ c2,sep =_)
data2_data1 data3_data1 data4_data1 data1_data2 data3_data2 data4_data2
1.440828e-04 8.629813e-05 1.514109e-04 5.583917e -04 -0.0001205821 2.866488e-04
2 -6.949384e-04 -2.508770e-04 -2.487813e-04 -1.005367e-04 -0.0001263202 -2.145225e-04
3 2.132192e-04 -4.609125e-04 4.551430e-04 -8.715424e-05 -0.0004593840 4.133856e-04
4 3.692403e-04 2.182627e-04 -1.116648e-04 3.835538e-04 0.0000408864 -4.244855e-05
5 -2.025772e-04 -4.032600e-04 5.442655e-05 -8.423568e-05 -0.0003484501 4.986815e-05
6 2.336373e-04 -2.838073e-04 -4.425935e-04 1.967203e-04 -0.0003805576 -4.109706e-04
7 2.661145e-05 1.250425e-04 -6.893342e-05 -6.508936e -04 0.0003408023 -2.436194e-04
8 1.456357e-04 3.991303e-04 -2.496687e-05 -3.501856e-04 0.0004980726 -1.304535e-04
9 -2.349110e-04 5.701233e- 04 2.359596e-04 1.343401e-04 0.0005555326 2.921120e-04
data1_data3 data2_data3 data4_data3 data1_data4 data2_data4 data3_data4
1 5.121547e-04 4.313395e-05 2.829814e-04 4.232081e-04 1.795365e-04 - 9.584175e-05
2 -1.649379e-06 -6.684696e-04 -2.349827e-04 1.975728e-04 -7.112598e-04 -3.014160e-04
3 -2.942277e-04 3.141257e -04 4.029018e-04 -3.420290e-04 2.382149e-04 -3.760631e-04
4 3.371847e-04 2.859362e-04 -3.420612e-05 3.168009e-04 3.048006e-04 1.062117e-04
5 - 1.651011e-04 -1.308671e-04 3.332034e-05 -5.127719e-05 -1.969902e-04 -3.890484e-04
6 2.550032e-05 2.586674e-04 -4.196917e-04 3.235528e-04 2.115955e-04 -3.627735e-04
7 -5.692790e-04 1.157675e-04 -2.277195e-04 -5.922595e-04 1.840773e-04 3.645036e-04
8 -2.258187e- 04 1.445371e-04 -1.077903e-04 -3.583290e-04 2.386756e-04 5.422018e-04
9 3.812360e-04 -3.628313e-04 3.051868e-04 8.276013e-05 -2.870674e-04 5.122258e-04
data1 data2 data3 data4
-0.710003 -0.714271 -0.709946 -0.713645
-0.710458 -0.715011 -0.710117 -0.714157
-0.71071 -0.714048 -0.710235 -0.713515
-0.710255 -0.713991 -0.709722 -0.71397
-0.710585 -0.714491 -0.710223 -0.713885
-0.710414 -0.714092 -0.710166 -0.71434
-0.711255 -0.714116 -0.70945 -0.714173
-0.71097 -0.714059 -0.70928 -0.714059
-0.710343 -0.714576 -0.709338 -0.713644
apply()
:# Function
analysisFunc <- function () {
# Fetch next data to be compared
nextColumn <<- currentColumn + 1
while (nextColumn <= ncol(Data)){
# Fetch the two columns on which to perform analysis
c1 <- Data[, currentColumn]
c2 <- Data[, nextColumn]
# Create linear model
linearModel <- lm(c1 ~ c2)
# Capture model data from summary
modelData <- summary(linearModel)
# Residuals
residualData <- t(t(modelData$residuals))
# Keep on appending data
linearData <<- cbind(linearData, residualData)
# Fetch next column
nextColumn <<- nextColumn + 1
}
# Increment the counter
currentColumn <<- currentColumn + 1
}
# Apply on function
apply(Data, 2, function(x) analysisFunc ())
apply()
will help me optimize the code. However, it seems to have no major effect. Run time is more than two hours.apply()
has been used? Is having while()
within apply()
call not a good idea? Any other way I can improve this code?expand.grid
of column names and then using mapply
the multiple input version of apply family where you pass two+ vectors/lists and run a function across each input elementwise. With this approach you avoid expanding vectors within looping and running an inner while
loop:Data <- read.table(text=" data1 data2 data3 data4
-0.710003 -0.714271 -0.709946 -0.713645
-0.710458 -0.715011 -0.710117 -0.714157
-0.71071 -0.714048 -0.710235 -0.713515
-0.710255 -0.713991 -0.709722 -0.71397
-0.710585 -0.714491 -0.710223 -0.713885
-0.710414 -0.714092 -0.710166 -0.71434
-0.711255 -0.714116 -0.70945 -0.714173
-0.71097 -0.714059 -0.70928 -0.714059
-0.710343 -0.714576 -0.709338 -0.713644", header=TRUE)
# Data frame of all combinations excluding same columns
modelcols <- subset(expand.grid(c1=names(Data), c2=names(Data),
stringsAsFactors = FALSE), c1!=c2)
# Function
analysisFunc <- function(x,y) {
# Fetch the two columns on which to perform analysis
c1 <- Data[[x]]
c2 <- Data[[y]]
# Create linear model
linearModel <- lm(c1 ~ c2)
# Capture model data from summary
modelData <- summary(linearModel)
# Residuals
residualData <- modelData$residuals
}
# Apply function to return matrix of residuals
linearData <- mapply(analysisFunc, modelcols$c1, modelcols$c2)
# re-naming matrix columns
colnames(linearData) <- paste(modelcols$c1, modelcols$c2, sep="_")
data2_data1 data3_data1 data4_data1 data1_data2 data3_data2 data4_data2
1 1.440828e-04 8.629813e-05 1.514109e-04 5.583917e-04 -0.0001205821 2.866488e-04
2 -6.949384e-04 -2.508770e-04 -2.487813e-04 -1.005367e-04 -0.0001263202 -2.145225e-04
3 2.132192e-04 -4.609125e-04 4.551430e-04 -8.715424e-05 -0.0004593840 4.133856e-04
4 3.692403e-04 2.182627e-04 -1.116648e-04 3.835538e-04 0.0000408864 -4.244855e-05
5 -2.025772e-04 -4.032600e-04 5.442655e-05 -8.423568e-05 -0.0003484501 4.986815e-05
6 2.336373e-04 -2.838073e-04 -4.425935e-04 1.967203e-04 -0.0003805576 -4.109706e-04
7 2.661145e-05 1.250425e-04 -6.893342e-05 -6.508936e-04 0.0003408023 -2.436194e-04
8 1.456357e-04 3.991303e-04 -2.496687e-05 -3.501856e-04 0.0004980726 -1.304535e-04
9 -2.349110e-04 5.701233e-04 2.359596e-04 1.343401e-04 0.0005555326 2.921120e-04
data1_data3 data2_data3 data4_data3 data1_data4 data2_data4 data3_data4
1 5.121547e-04 4.313395e-05 2.829814e-04 4.232081e-04 1.795365e-05 -9.584175e-05
2 -1.649379e-06 -6.684696e-04 -2.349827e-04 1.975728e-04 -7.112598e-04 -3.014160e-04
3 -2.942277e-04 3.141257e-04 4.029018e-04 -3.420290e-04 2.382149e-04 -3.760631e-04
4 3.371847e-04 2.859362e-04 -3.420612e-05 3.168009e-04 3.048006e-04 1.062117e-04
5 -1.651011e-04 -1.308671e-04 3.332034e-05 -5.127719e-05 -1.969902e-04 -3.890484e-04
6 2.550032e-05 2.586674e-04 -4.196917e-04 3.235528e-04 2.115955e-04 -3.627735e-04
7 -5.692790e-04 1.157675e-04 -2.277195e-04 -5.922595e-04 1.840773e-04 3.645036e-04
8 -2.258187e-04 1.445371e-04 -1.077903e-04 -3.583290e-04 2.386756e-04 5.422018e-04
9 3.812360e-04 -3.628313e-04 3.051868e-04 8.276013e-05 -2.870674e-04 5.122258e-04