R中的For循环,先计算组,然后计算组件 [英] For loop in R that calculates for the group and then the components

查看:68
本文介绍了R中的For循环,先计算组,然后计算组件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组数据和一个循环,其中包含对该数据集的大量计算,其中该组的各个组成部分被拆分为一个子集,并逐个循环.但是,我首先需要能够对整个原始数据集执行相同的计算.

对于一个称为 masterdata 的虚构数据集,它具有3个组成部分(列D1)和多个变量(X2-X10),例如:

 #masterdata#D1 X2 X3 X4 X5 X6 X7 X8 X9 X10#不适用不适用不适用不适用不适用不适用#B不适用不适用不适用不适用不适用不适用#C不适用不适用不适用不适用不适用不适用#B不适用不适用不适用不适用不适用不适用#B不适用不适用不适用不适用不适用不适用#C不适用不适用不适用不适用不适用不适用#C不适用不适用不适用不适用不适用不适用#不适用不适用不适用不适用不适用不适用#B不适用不适用不适用不适用不适用不适用#不适用不适用不适用不适用不适用不适用 

已存在一个循环,用于为组件A拆分一个子集,执行计算,输出结果,然后对B和C重复此操作:

  Component.List = c("A","B","C")for(k in 1:length(Component.List)){子数据=子集(主数据,D1 == Component.List [k])#在循环中对子数据"执行了大量计算}#循环结束 

我想做的是首先对整个 masterdata 执行相同的大量计算,然后开始循环遍历各个组件.

计算的部分输出是,在执行循环之前,将创建的两个向量放置在创建的数据帧的第一列中:

 #在循环开始之前,在下面创建了两个帧组件= 3#在此示例中,列D1中有3个组件-"A","B","C"Result.Frame.V1 = as.data.frame(矩阵(0,nrow = 200,ncol =组件))Result.Frame.V2 = as.data.frame(矩阵(0,nrow = 200,ncol =组件))#循环运行并包含所有计算,并且在计算中,最后两个下面的#行将生成的两个向量放置到帧的第k列中.Result.Frame.V1 [,k] = V1.ResultResult.Frame.V2 [,k] = V2.Result#第一次运行"A"循环将把输出放在第一列#"B"循环的第二次运行会将输出放置在第二列中,依此类推.#通过扩展也可以针对整个组计算以上数据帧#将扩展为一个额外的列,该列将容纳整个结果向量#masterdata运行计算 

我最初的理论解决方案是在循环中为masterdata编写一次每个计算,然后进行上述循环,但是这些计算是数百行代码!

是否可以在For循环中合并一种计算原始数据,然后继续循环遍历各个组件的方法?

解决方案

如果要输出数据帧,则创建一个函数,该函数在传递数据帧时执行计算并输出数据帧将是关键.在下面的示例中,该函数称为 your_function().

为简单起见,使用了一个三阶段过程,首先在整个数据集上创建输出数据框,然后对子数据集执行相同的计算.然后,将子数据集绑定到单个数据帧中,然后再将其与完整数据集的输出合并.

注意:我创建了一个名为"Subset"的新变量,以便所有输出都可识别为属于每个不同的集合.

 库(dplyr)FullSet<-your_function(masterdata)%>%mutate(子集=完整")子集<-lapply(唯一(D1),函数(n){主数据%>%过滤器(D1 == n)%&%your_function(.)%>%mutate(子集= n)})%>%bind_rows()FinalSet<-bind_rows(FullSet,SubSets) 

如果您想并行运行该进程以提高速度,请使用

mclapply(unique(D1),function ...,mc.cores = detectCores())

I have a set of data and a loop containing numerous calculations for the data set, where the individual components of the set are split into a subset and cycled through one by one. However I need to be able to execute the same calculations across the original data set as a whole first.

For a fictional data set called masterdata with 3 components (column D1) and numerous variables (X2-X10) as such:

# masterdata
#   D1 X2 X3 X4 X5 X6 X7 X8 X9 X10
#   A  NA NA NA NA NA NA NA NA  NA
#   B  NA NA NA NA NA NA NA NA  NA
#   C  NA NA NA NA NA NA NA NA  NA
#   B  NA NA NA NA NA NA NA NA  NA
#   B  NA NA NA NA NA NA NA NA  NA
#   C  NA NA NA NA NA NA NA NA  NA
#   C  NA NA NA NA NA NA NA NA  NA
#   A  NA NA NA NA NA NA NA NA  NA
#   B  NA NA NA NA NA NA NA NA  NA
#   A  NA NA NA NA NA NA NA NA  NA

A loop is in place to split off a subset for component A, perform the calculations, output the results and then repeat this for B and C:

Component.List = c("A", "B", "C")

for(k in 1:length(Component.List)) {        
      subdata = subset(masterdata, D1 == Component.List[k])
      # Numerous calculations performed on "subdata" within the loop
}
# End of loop

What I am trying to do is initially perform the same numerous calculations against the whole of masterdata and then start looping through the individual components.

Part of the output from the calculations is that two vectors that are created are placed into the first column of the data frames created just prior to executing the loop:

# Prior to the start of the loop two frames below created
Components = 3 # In this example 3 components in column D1 - "A", "B", "C"

Result.Frame.V1 = as.data.frame(matrix(0, nrow = 200, ncol = Components))
Result.Frame.V2 = as.data.frame(matrix(0, nrow = 200, ncol = Components))

# Loop runs and contains all of the calculations and within the calculations the last two  
# lines below place two vectors generated into the the kth columns of the frames.

Result.Frame.V1[,k] = V1.Result
Result.Frame.V2[,k] = V2.Result

# First run of the loop for "A" will place the outputs in the 1st columns 
# Second run of the loop for "B" will place the outputs in the 2nd columns, etc.
# With the expansion to also calculate against the whole group, the above data frames
# would be expanded to an extra column that would hold the result vector for the whole 
# masterdata run through the calculations 

My initial theoretical solution is to write every calculation in the loop once for masterdata and then have the above loop, however the calculations are hundreds of lines of code!

Is it possible to incorporate into the For loop a way to calculate for the original data and then continue cycling through the components?

解决方案

If you are outputting dataframes then creating a function that performs your calculations when passed a dataframe, and outputs a dataframe will be key. In the below example the function is called your_function().

For simplicity a Three stage process is used, first to create the output dataframe on the overall dataset then lapply to perform the same calculations on the sub datasets. The sub datasets are then bound together into a single dataframe before finally being combined with the output of the full dataset.

note: I created a new variable called "Subset" so that the outputs are all identifiable as belonging to each distinct set.

library(dplyr)
FullSet <- your_function(masterdata) %>% mutate(Subset = "Full")

SubSets <- lapply(unique(D1), function(n){
    masterdata %>% filter(D1 == n) %>%
      your_function(.) %>% mutate(Subset = n)
  }) %>% bind_rows()

FinalSet <- bind_rows(FullSet, SubSets)

if you want to run the process in parallel for speed then use

mclapply(unique(D1), function..., mc.cores=detectCores())

这篇关于R中的For循环,先计算组,然后计算组件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆