R:设置初始条件的for循环的dplyr解 [英] R: dplyr solution for for-loop with initial conditions set

查看:80
本文介绍了R:设置初始条件的for循环的dplyr解的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个一年中有40天的数据和一些数据

I have a data which has 40 days of the year and some data

set.seed(123)
df <- data.frame(day = 1:40,rain = runif(40,min = 0, max = 3), petc = runif(40, min = 0.3, max = 8),swc = runif(40, min = 27.01, max = 117.43))

我想每天计算另一个变量aetc计算如下:

I want to calculate another variable called aetc for each day which is calculated as follows:

SW.ini <- 2 # setting some initial values 
SW.max <- 5
SW.min <- 0

第一天,

1)确定一个名为 PAW(day1)= SW.ini + rain(day1)

2)如果 PAW(day1)> = SWC(day1),aetc(day1)= petc(day1);

If `PAW(day1) < SWC(day1), aetc(day1) = PAW(day1)/SWC(day1) * petc(day1)`

3)检查 aetc(day1)> PAW(第1天)。如果是,则aetc(day1)= paw(day1)

4)更新 SW(day1)= SW。 ini + rain(day1)-aetc(day1)

5)如果 SW(day1)> SW.max,SW(day1)= SW.max。同样,如果 SW(day1)< SW.min,SW(day1)= SW.min`

5) If SW(day1) > SW.max, SW(day1) = SW.max. Similarly ifSW(day1) < SW.min, SW(day1) = SW.min`

第2天重复

1)确定 PAW(day2)= SW(第一天)+雨(第二天)

2)如果 PAW(第二天)> = SWC(day2),aetc(day2)= petc(day2);
如果 PAW(day2)< SWC(day2),aetc(day2)= PAW(day2)/ SWC(day2)* petc(day2)

3)检查是否 aetc(day2)> PAW(day2)。如果是, aetc(day2)=爪子(day2)

3) Check if aetc(day2) > PAW(day2). If yes, aetc(day2) = paw(day2)

4)更新 SW (day2)= SW(day1)+雨(day2)-aetc(day2)

5)如果 SW (第2天) SW.max,SW(day2)= SW.max。类似地,如果 SW(day2)< SW.min,
SW(day2)= SW.min`

5) If SW(day2) > SW.max, SW(day2) = SW.max. Similarly ifSW(day2) < SW.min, SW(day2) = SW.min`

这是我优雅的for循环这样做:

Here's my elegant for loop to do this:

      df$PAW <- NA
      df$aetc <- NA
      df$SW <- NA

      df$PAW[1] <- SW.ini + df$rain[1]

      df$aetc[1] <- ifelse(df$PAW[1] >= df$swc[1], df$petc[1],(df$PAW[1]/df$swc[1])*df$petc[1])
      df$aetc[1] <- ifelse(df$aetc[1] > df$PAW[1], df$PAW[1], df$aetc[1])
      df$SW[1] <- SW.ini + df$rain[1] -  df$aetc[1]
      df$SW[1] <- ifelse(df$SW[1] > SW.max, SW.max, ifelse(df$SW[1] < 0, 0,df$SW[1]))

      for (day in 2:nrow(df)){

        df$PAW[day] <- df$SW[day - 1] + df$rain[day]
        df$aetc[day] <- ifelse(df$PAW[day] >= df$swc[day], df$petc[day], (df$PAW[day]/df$swc[day]) * df$petc[day])
        df$aetc[day] <- ifelse(df$aetc[day] > df$PAW[day], df$PAW[day],df$aetc[day])
        df$SW[day] <- df$SW[day - 1] + df$rain[day] -  df$aetc[day]
        df$SW[day] <- ifelse(df$SW[day] > SW.max,SW.max, ifelse(df$SW[day] < 0, 0,df$SW[day]))
      }

我的问题是,这只是一年的数据,我想将其运行多年。

My problem is that this is just one year of data and I want run it for multiple years.

      set.seed(123)
      df <- data.frame(year = 1980:2015, day = rep(1:40, each = 36),rain = 
      runif(40*36,min = 0, max = 3), petc = runif(40*36, min = 0.3, max = 8),swc = runif(40*36, min = 27.01, max = 117.43))

所以我想做

                df %>% group_by(year) # and then run the above function for each year. 

是否有dplyr或其他解决方案?

Is there a dplyr or any other solution to this?

谢谢

推荐答案


注意:我最初将此答案发布在您的跟进上问题, R:foreach循环内的for循环 ,但是在看到这一答案之后,似乎这个答案在这里更有意义了。 (我的回答中未涉及与并行化相关的任何事情,这是您的后续话题)

Note: I originally posted this answer on your follow up question, R: for loop within a foreach loop, but after seeing this one, it seems this answer is far more relevant here. (I don't address anything related to parallelizing in my answer, which was the topic of your follow up).



使用 Rcpp data.table



使用C ++编译逻辑并使用data.group进行应用。表的分组操作使您的基准速度提高了约2,000倍,远远超出了并行化所希望的速度。

Using Rcpp and data.table

Compiling the logic with C++ and applying it by group using data.table grouping operations gives a ~2,000x speed-up from your baseline, far greater than you might hope to get by parallelizing.

在您的原始示例中,该示例具有 39,420,000行,它在我的计算机上执行的时间为 1.883秒;并在修订后的 28,800行上执行,它在 0.004秒内执行

On your original example, which had 39,420,000 rows, this executes on my machine in 1.883 seconds; and on the revised one with 28,800 rows, this executes in 0.004 seconds

library(data.table)
library(Rcpp)

定义并编译 C ++ 函数,在R脚本中内联 CalcSW()

Define and compile a C++ function, CalcSW() inline in the R script:

一个注释:以 C / C ++ 开头的计数是 0 R 不同,后者以 1 开始-这就是索引在这里与众不同的原因

One note: counting in C/C++ starts at 0, unlike R, which starts at 1-- that's why the indices are different here

Rcpp::cppFunction('
List CalcSW(NumericVector SW_ini,
            NumericVector SW_max,
            NumericVector rain,
            NumericVector swc,
            NumericVector PETc) {

  int n = SW_ini.length();
  NumericVector SW(n);
  NumericVector PAW(n);
  NumericVector aetc(n);

  double SW_ini_glob = SW_ini[0];
  double SW_max_glob = SW_max[0];

  SW[0] = SW_ini_glob;
  PAW[0] = SW[0] + rain[0];

  if (PAW[0] > swc[0]){
    aetc[0] = PETc[0];
  } else {
    aetc[0] = PAW[0]/swc[0]*PETc[0];
  }

  if (aetc[0] > PAW[0]){
    aetc[0] = PAW[0];
  }

  SW[0] = SW[0] + rain[0] - aetc[0];

  if(SW[0] > SW_max_glob){
    SW[0] = SW_max_glob;
  }

  if(SW[0] < 0){
    SW[0] = 0;
  }

  for (int i = 1; i < n; i++) {

    PAW[i] = SW[i-1] + rain[0];

    if (PAW[i] > swc[i]){
      aetc[i] = PETc[i];
    } else {
      aetc[i] = PAW[i]/swc[i]*PETc[i];
    }

    if (aetc[i] > PAW[i]){
      aetc[i] = PAW[i];
    }

    SW[i] = SW[i-1] + rain[i] - aetc[i];

    if(SW[i] > SW_max_glob){
      SW[i] = SW_max_glob;
    }

    if(SW[i] < 0){
     SW[i] = 0;
    }
  }
  return Rcpp::List::create(Rcpp::Named("SW") = SW,
                            Rcpp::Named("PAW") = PAW,
                            Rcpp::Named("aetc") = aetc);
}')

创建data.table

Create data.table

df <- data.table(loc.id = rep(1:10, each = 80*36), 
                 year = rep(rep(1980:2015, each = 80), times = 10),
                 day = rep(rep(1:80, times = 36),times = 10),
                 rain = runif(10*36*80, min = 0 , max = 5),
                 swc = runif(10*36*80,min = 0, max = 50),
                 SW_max = rep(runif(10, min = 100, max = 200), each = 80*36),
                 SW_ini = runif(10*36*80),
                 PETc = runif(10*36*80, min = 0 , max = 1.3),
                 SW = as.numeric(NA),
                 PAW = as.numeric(NA), 
                 aetc = as.numeric(NA))

setkey(df, loc.id, year, day)

执行函数 CalcSW() df 上,用于 loc.id 年的每种组合,将返回值同时分配给三列

Execute the function CalcSW() on the df for each combination of loc.id and year, assign returned values to the three columns simultaneously:

system.time({
  df[,  c("SW","PAW","aetc") := CalcSW(SW_ini,
                                       SW_max,
                                       rain,
                                       swc,
                                       PETc), keyby = .(loc.id, year)]
})

...

   user  system elapsed 
  0.004   0.000   0.004 

结果:

head(df)

...

   loc.id year day       rain       swc   SW_max     SW_ini      PETc       SW      PAW       aetc
1:      1 1980   1 0.35813251 28.360715 177.3943 0.69116310 0.2870478 1.038675 1.049296 0.01062025
2:      1 1980   2 1.10331116 37.013022 177.3943 0.02742273 0.4412420 2.125335 1.396808 0.01665171
3:      1 1980   3 1.76680011 32.509970 177.3943 0.66273062 1.1071233 3.807561 2.483467 0.08457420
4:      1 1980   4 3.20966558  8.252797 177.3943 0.12220454 0.3496968 6.840713 4.165693 0.17651342
5:      1 1980   5 1.32498191 14.784203 177.3943 0.66381497 1.2168838 7.573160 7.198845 0.59253503
6:      1 1980   6 0.02547458 47.903637 177.3943 0.21871598 1.0864713 7.418750 7.931292 0.17988449

我不是100%肯定的我完全实现了您的逻辑,但是逻辑应该可以很容易地调整我可能错过的东西,我以与您布置的方式非常相似的方式实施它。

I'm not 100% positive I implemented your logic perfectly, but the logic should be pretty straightforward to tweak where I may have missed something, I implemented it in a very similar manner to how you laid it out.

另一个注意事项:使用自动缩进和代码突出显示(无论您使用的是RStudio还是Emacs),编写 C ++ 更加容易如果您创建了一个单独的文件,则得到的文件名为 TestCode.cpp ,格式如下。

One other note: It's way easier to write C++ with auto-indenting and code highlighting (whether you're using RStudio or Emacs) you get if you create a separate file, named something like TestCode.cppformatted like below.

然后,您可以或者使用 Rcpp :: sourceCpp( TestCode.cpp)在R脚本中编译函数,或者您可以像前文一样将前三行以外的所有内容作为字符串复制并粘贴到 Rcpp :: cppFunction()的参数中。

Then, you can either use Rcpp::sourceCpp("TestCode.cpp") to compile your function in your R Script, or you can copy and paste everything except for the first three lines as a character string into as an argument of Rcpp::cppFunction() like I did above.

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
List CalcSW(NumericVector SW_ini,
                     NumericVector SW_max,
                     NumericVector rain,
                     NumericVector swc,
                     NumericVector PETc) {

  int n = SW_ini.length();
  NumericVector SW(n);
  NumericVector PAW(n);
  NumericVector aetc(n);

  double SW_ini_glob = SW_ini[0];
  double SW_max_glob = SW_max[0];

  SW[0] = SW_ini_glob;
  PAW[0] = SW[0] + rain[0];

  if (PAW[0] > swc[0]){
    aetc[0] = PETc[0];
  } else {
    aetc[0] = PAW[0]/swc[0]*PETc[0];
  }

  if (aetc[0] > PAW[0]){
    aetc[0] = PAW[0];
  }

  SW[0] = SW[0] + rain[0] - aetc[0];

  if(SW[0] > SW_max_glob){
    SW[0] = SW_max_glob;
  }

  if(SW[0] < 0){
    SW[0] = 0;
  }

  for (int i = 1; i < n; i++) {

    PAW[i] = SW[i-1] + rain[0];

    if (PAW[i] > swc[i]){
      aetc[i] = PETc[i];
    } else {
      aetc[i] = PAW[i]/swc[i]*PETc[i];
    }

    if (aetc[i] > PAW[i]){
      aetc[i] = PAW[i];
    }

    SW[i] = SW[i-1] + rain[i] - aetc[i];

    if(SW[i] > SW_max_glob){
      SW[i] = SW_max_glob;
    }

    if(SW[i] < 0){
      SW[i] = 0;
    }
  }
  return Rcpp::List::create(Rcpp::Named("SW") = SW,
                            Rcpp::Named("PAW") = PAW,
                            Rcpp::Named("aetc") = aetc);
}

这篇关于R:设置初始条件的for循环的dplyr解的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆