我怎样才能并行R的双循环? [英] How can I parallelize a double for loop in R?
问题描述
我一直在尝试并行化我的代码,因为目前我正在使用double for循环来记录结果。我一直在试图看看如何在R中使用SNOW和doParallel包来做到这一点。
如果你想要一个可复制的例子,只需使用
(样本(c('ANOMALY','无信号'),300,替换= T),n行= 100(b) )而不是使用这三行
<$ (输入文件)$(输入文件)$(输入文件)$ {code> b
$ b residual_anomalies< - conceptdrift(data,length = 10,threshold = .05)
在嵌套for循环。整个代码如下。
$ b $ pre code源代码(GetMetrics.R)
源代码(slowdrift_resampling_vectorized.R) (矩阵)(ncol = 10,nrow = 1),其中,b = b
度量< - 唯一(度量)
num_metrics< 46))
f1_scores_table_pred = data.frame(矩阵(ncol = 10,nrow = 46))
rownames(f1_scores_table_raw)< - 指标
colnames(f1_scores_table_raw)< - paste0(Sim,1:10)
rownames(f1_scores_table_pred)< - 指标
colnames(f1_scores_table_pred)< - paste0(Sim,1:10) (1:num_metrics){
(我在1:10){
#inputfile< - paste0(simulation_ (数据,长度= 1,...,_,度量[k],_ US.csv)
#数据< - 残差(输入文件)
#residual_anomalies < - conceptdrift 10,阈值= .05)
#上面是我如何获得数据框,但我会创造另一个重现性。
residual_anomalies < - as.data.frame(matrix(sample(c('ANOMALY','NO SIGNAL'),300,replace = T),nrow = 100))
names(residual_anomalies) < -c(Raw_Anomaly,Prediction_Anomaly,True_Anomaly)
#计算F1分数的精度和召回量
#first for raw data
count < - ifelse(rowSums(residual_anomalies [c(Raw_Anomaly,True_Anomaly)=='ANOMALY')== 2,1,0)
correct_detections < - sum(计数)
total_predicted = sum(residual_anomalies $ Raw_Anomaly =='ANOMALY')
total_actual = sum(residual_anomalies $ True_Anomaly =='ANOMALY')
raw_precision = correct_detections / total_predicted
raw_recall = correct_detections / total_actual
f1_raw = 2 * raw_precision * raw_recall /(raw_precision + raw_recall)
#用于预测(DLM,ESP, MLR)data
count < - ifelse(rowSums(residual_anomalies [c(Prediction_Anomaly,True_Anomaly)] =='ANOMALY')== 2,1,0)
correct_detections < - sum(counts)
total_predicted = sum(residual_anomalies $ Prediction_Anomaly =='ANOMALY')
total_actual = sum(residual_anomalies $ True_Anomaly =='ANOMALY')
pred_precision = correct_detections / total_predicted
pred_recall = correct_detections / total_actual
f1_pred = 2 * pred_precision * pred_recall /(pred_precision + pred_recall)
f1_scores_table_raw [[k,i]] < - f1_raw
f1_scores_table_pred [[k,i]] < - f1_pred
}
$ b之前,我在外层循环中使用了foreach和%dopar %,但我遇到的问题是,我一直没有找到'%dopar%'这个问题。我应该并行两个循环或只是一个?
另外我知道foreach创建一个列表,并将其存储到一个变量,但我仍然可以有其他变量存储数据在我的foreach循环?例如,我仍然想要将数据记录到我的f1_scores_table_raw和f1_scores_table_pred数组中。
谢谢!
如果在循环级别之间使用%:%
操作符(请参阅嵌套小插图),Foreach将自动处理此操作:
$ $ p $
require(foreach)
#注册并行后端
foreach(k = 1:num_metrics)%: %#嵌套操作符
foreach(i = 1:10)%dopar%{
#代码并行
}
I've been trying to parallelize my code because currently I'm using a double for loop to record results. I've been trying to see how to use the SNOW and doParallel packages in R to do this.
If you would like a replicable example, just use
residual_anomalies <- matrix(sample(c('ANOMALY','NO SIGNAL'),300,replace=T),nrow=100)
instead of using these three lines
inputfile <- paste0("simulation_",i,"_",metrics[k],"_US.csv")
data <- residuals(inputfile)
residual_anomalies <- conceptdrift(data,length=10,threshold=.05)
in the nested for loop. The whole code is below.
source("GetMetrics.R")
source("slowdrift_resampling_vectorized.R")
metrics <- unique(metrics)
num_metrics <- length(metrics)
f1_scores_table_raw = data.frame(matrix(ncol=10,nrow=46))
f1_scores_table_pred = data.frame(matrix(ncol=10,nrow=46))
rownames(f1_scores_table_raw) <- metrics
colnames(f1_scores_table_raw) <- paste0("Sim",1:10)
rownames(f1_scores_table_pred) <- metrics
colnames(f1_scores_table_pred) <- paste0("Sim",1:10)
for(k in 1:num_metrics){
for(i in 1:10){
#inputfile <- paste0("simulation_",i,"_",metrics[k],"_US.csv")
#data <- residuals(inputfile)
#residual_anomalies <- conceptdrift(data,length=10,threshold=.05)
#the above is how I get the data frame but I'll create another one for reproducibility.
residual_anomalies <- as.data.frame(matrix(sample(c('ANOMALY','NO SIGNAL'),300,replace=T),nrow=100))
names(residual_anomalies) <- c("Raw_Anomaly","Prediction_Anomaly","True_Anomaly")
#calculate precision and recall for an F1 score
#first for raw data
counts <- ifelse(rowSums(residual_anomalies[c("Raw_Anomaly","True_Anomaly")]=='ANOMALY')==2,1,0)
correct_detections <- sum(counts)
total_predicted = sum(residual_anomalies$Raw_Anomaly =='ANOMALY')
total_actual = sum(residual_anomalies$True_Anomaly =='ANOMALY')
raw_precision = correct_detections / total_predicted
raw_recall = correct_detections / total_actual
f1_raw = 2*raw_precision*raw_recall / (raw_precision+raw_recall)
#then for prediction (DLM,ESP,MLR) data
counts <- ifelse(rowSums(residual_anomalies[c("Prediction_Anomaly","True_Anomaly")]=='ANOMALY')==2,1,0)
correct_detections <- sum(counts)
total_predicted = sum(residual_anomalies$Prediction_Anomaly =='ANOMALY')
total_actual = sum(residual_anomalies$True_Anomaly =='ANOMALY')
pred_precision = correct_detections / total_predicted
pred_recall = correct_detections / total_actual
f1_pred = 2*pred_precision*pred_recall / (pred_precision+pred_recall)
f1_scores_table_raw[[k,i]] <- f1_raw
f1_scores_table_pred[[k,i]] <- f1_pred
}
}
Before, I was using foreach on the outer loop with a %dopar% but the issue I'm having is that I kept getting the issue '%dopar%' not found. Should I parallelize both loops or just one?
Also I know foreach creates a list and stores it into a variable, but can I still have other variables store data in my foreach loop? For example, I still want to record data into my f1_scores_table_raw and f1_scores_table_pred arrays.
Thanks!
Foreach will automatically handle this if you use the %:%
operator between loop levels (see the "nesting" vignette):
require(foreach)
# Register parallel backend
foreach (k = 1:num_metrics) %:% # nesting operator
foreach (i = 1:10) %dopar% {
# code to parallelise
}
这篇关于我怎样才能并行R的双循环?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!