R.中的存储问题,可替代嵌套循环以创建矩阵数组,然后创建多个图 [英] storage problem in R. alternative to nested loop for creating array of matrices and then multiple plots
问题描述
借助以下信息,我可以轻松创建矩阵数组
With the following pieces of information, I can easily create an array of matrices
b0=data.frame(b0_1=c(11.41,11.36),b0_2=c(8.767,6.950))
b1=data.frame(b1_1=c(0.8539,0.9565),b1_2=c(-0.03179,0.06752))
b2=data.frame(b2_1=c(-0.013020 ,-0.016540),b2_2=c(-0.0002822,-0.0026720))
T.val=data.frame(T1=c(1,1),T2=c(1,2),T3=c(2,1))
dt_data=cbind(b0,b1,b2,T.val)
fu.time=seq(0,50,by=0.8)
pat=ncol(T.val) #number of T's
nit=2 #no of rows
pt.array1=array(NA, dim=c(nit,length(fu.time),pat))
for ( it.er in 1:nit){
for ( ti in 1:length(fu.time)){
for (pt in 1:pat){
pt.array1[it.er,ti,pt]=b0[it.er,T.val[it.er,pt]]+b1[it.er,T.val[it.er,pt]]*fu.time[ti]+b2[it.er,T.val[it.er,pt]]*fu.time[ti]^2
}
}
}
pt.array_mean=apply(pt.array1, c(3,2), mean)
pt.array_LCL=apply(pt.array1, c(3,2), quantile, prob=0.25)
pt.array_UCL=apply(pt.array1, c(3,2), quantile, prob=0.975)
现在有了这些附加数据,我可以创建如下三个图
Now with these additional data, I can create three plots as follows
mydata
pt.ID time IPSS
1 1 0.000000 10
2 1 1.117808 8
3 1 4.504110 5
4 1 6.410959 14
5 1 13.808220 10
6 1 19.890410 4
7 1 28.865750 15
8 1 35.112330 7
9 2 0.000000 6
10 2 1.117808 7
11 2 4.109589 8
12 2 10.093151 7
13 2 16.273973 11
14 2 18.345205 18
15 2 21.567120 14
16 2 25.808220 12
17 2 56.087670 5
18 3 0.000000 8
19 3 1.413699 3
20 3 4.405479 3
21 3 10.389041 8
pdf("plots.pdf")
par(mfrow=c(3,2))
for( pt.no in 1:pat){
plot(IPSS[ID==pt.no]~time[ID==pt.no],xlim=c(0,57),ylim=c(0,35),type="l",col="black",
xlab="f/u time", ylab= "",main = paste("patient", pt.no),data=mydata)
points(IPSS[ID==pt.no]~time[ID==pt.no],data=mydata)
lines(pt.array_mean[pt.no,]~fu.time, col="blue")
lines(pt.array_LCL[pt.no,]~fu.time, col="green")
lines(pt.array_UCL[pt.no,]~fu.time, col="green")
}
dev.off()
当每个矩阵中的行数大得多(例如10000)时,就会出现问题.为b0
,b1
和b2
中的大量行创建pt.array1
会花费太多的计算时间.
有什么其他方法可以使用任何内置函数快速完成此操作吗?
是否可以避免pt.array1
的存储分配,因为我不再使用它了?我只需要pt.array_mean
,pt.array_UCL
和pt.array_LCL
作为myplot
.
感谢您的帮助.
The problem arise when the number of rows in each matrix is much bigger say 10000. It takes too much computation time to create the pt.array1
for large number of rows in b0
, b1
and b2
.
Is there any alternative way I can do it quickly using any builtin function?
Can I avoid the storage allocation for pt.array1
as I am not using it further? I just need pt.array_mean
, pt.array_UCL
and pt.array_LCL
for myplot
.
Any help is appreciated.
推荐答案
您可以采用其他两种方法.
There are a couple of other approaches you can employ.
首先,您基本上拥有b0 + b1*fu + b2*fu^2
的模型.因此,您可以计算出系数并在事实之后应用fu
:
First, you largely have a model of b0 + b1*fu + b2*fu^2
. Therefore, you could make the coefficients and apply the fu
after the fact:
ind <- expand.grid(nits = seq_len(nit), pats = seq_len(pat))
mat_ind <- cbind(ind[, 'nits'], T.val[as.matrix(ind)])
b_mat <- matrix(c(b0[mat_ind], b1[mat_ind], b2[mat_ind]), ncol = 3)
b_mat
[,1] [,2] [,3]
[1,] 11.410 0.85390 -0.0130200
[2,] 11.360 0.95650 -0.0165400
[3,] 11.410 0.85390 -0.0130200
[4,] 6.950 0.06752 -0.0026720
[5,] 8.767 -0.03179 -0.0002822
[6,] 11.360 0.95650 -0.0165400
现在,如果我们将模型应用于每一行,我们将获得您的所有原始结果.唯一的问题是我们与您的原始输出不匹配-数组的每个列切片都等同于矩阵输出的行切片.
Now if we apply the model to each row, we will get all of your raw results. The only problem is that we don't match your original output - each column slice of your array is equivalent of a row slice of my matrix output.
pt_array <- apply(b_mat, 1, function(x) x[1] + x[2] * fu.time + x[3] * fu.time^2)
pt_array[1,]
[1] 11.410 11.360 11.410 6.950 8.767 11.360
pt.array1[, 1, ]
[,1] [,2] [,3]
[1,] 11.41 11.41 8.767
[2,] 11.36 6.95 11.360
没关系,因为我们可以在获取摘要统计信息时固定其形状-我们只需要将每行的colSums
和colQuantiles
转换为2 x 3矩阵:
That's OK because we can fix the shape of it as we get summary statistics - we just need to take the colSums
and colQuantiles
of each row converted to a 2 x 3 matrix:
library(matrixStats)
pt_summary = array(t(apply(pt_array,
1,
function(row) {
M <- matrix(row, ncol = pat)
c(colMeans2(M),colQuantiles(M, probs = c(0.25, 0.975))
)
}
)),
dim = c(length(fu.time), pat, 3),
dimnames = list(NULL, paste0('pat', seq_len(pat)), c('mean', 'LCL', 'UCL'))
)
pt_summary[1, ,] #slice at time = 1
mean LCL UCL
pat1 11.3850 11.37250 11.40875
pat2 9.1800 8.06500 11.29850
pat3 10.0635 9.41525 11.29518
# rm(pt.array1)
然后做最后的绘图,我将其简化了-data
参数可以是subset(mydata, pt.ID == pt.no)
.此外,由于摘要统计信息现在采用数组格式,因此matlines
允许立即完成所有操作:
Then to do your final graphing, I simplified it - the data
argument can be a subset(mydata, pt.ID == pt.no)
. Additionally, since the summary statistics are now in an array format, matlines
allows everything to be done at once:
par(mfrow=c(3,2))
for( pt.no in 1:pat){
plot(IPSS~pt.ID, data=subset(mydata, pt.ID == pt.no),
xlim=c(0,57), ylim=c(0,35),
type="l",col="black", xlab="f/u time", ylab= "",
main = paste("patient", pt.no)
)
points(IPSS~time, data=subset(mydata, pt.ID == pt.no))
matlines(y = pt_summary[,pt.no ,], x = fu.time, col=c("blue", 'green', 'green'))
}
这篇关于R.中的存储问题,可替代嵌套循环以创建矩阵数组,然后创建多个图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!