在dplyr中模拟一个时间段,而不是使用for循环 [英] Simulating a timeseries in dplyr instead of using a for loop

查看:73
本文介绍了在dplyr中模拟一个时间段,而不是使用for循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以,虽然dplyr中的滞后 lead 很棒,但我想模拟一些像人口的时代生长。我的旧学校代码看起来像:

  tdf<  -  data.frame(time = 1:5,pop = 50 )
for(i in 2:5){
tdf $ pop [i] = 1.1 * tdf $ pop [i-1]
}

产生

  time pop 
1 1 50.000
2 2 55.000
3 3 60.500
4 4 66.550
5 5 73.205

我觉得有必要有一个 dplyr tidyverse 这样做(尽管我喜欢我的循环)。



但是,像

  tdf<  -  data.frame(time = 1:5,pop = 50)%>%
mutate(pop = 1.1 * lag(pop))
 



时间流行
1 1 NA
2 2 55
3 3 55
4 4 55
5 5 55

我觉得我缺少一些明显的东西....是什么?



注意 - 这是一个三维例如 - 我的真实例子使用多个参数,其中许多参数是随时间变化的(我正在模拟不同GCM场景下的预测),因此,整体性证明是将我的模拟结合在一起的有力工具。

解决方案

减少(或它的purrr变体,如果你喜欢)是你想要的累积功能尚未包含 cum * 版本:

  data.frame(time = 1:5,pop = 50)%>%
mutate(pop = Reduce(function(x,y){x * 1.1},pop,accumulate = TRUE))

##时间流行
## 1 1 50.000
## 2 2 55.000
## 3 3 60.500
## 4 4 66.550
## 5 5 73.205

或与purrr

  data.frame(time = 1:5,pop = 50)%>%
mutate(pop = accumulate(pop,〜.x * 1.1))

##时间流行
## 1 1 50.000
## 2 2 55.000
## 3 3 60.500
## 4 4 66.550
## 5 5 73.205


So, while lag and lead in dplyr are great, I want to simulate a timeseries of something like population growth. My old school code would look something like:

tdf <- data.frame(time=1:5, pop=50)
for(i in 2:5){
  tdf$pop[i] = 1.1*tdf$pop[i-1]
}

which produces

  time    pop
1    1 50.000
2    2 55.000
3    3 60.500
4    4 66.550
5    5 73.205

I feel like there has to be a dplyr or tidyverse way to do this (as much as I love my for loop).

But, something like

tdf <- data.frame(time=1:5, pop=50) %>%
  mutate(pop = 1.1*lag(pop))

which would have been my first guess just produces

  time pop
1    1  NA
2    2  55
3    3  55
4    4  55
5    5  55

I feel like I'm missing something obvious.... what is it?

Note - this is a trivial example - my real examples use multiple parameters, many of which are time-varying (I'm simulating forecasts under different GCM scenarios), so, the tidyverse is proving to be a powerful tool in bringing my simulations together.

解决方案

Reduce (or its purrr variants, if you like) is what you want for cumulative functions that don't already have a cum* version written:

data.frame(time = 1:5, pop = 50) %>%
    mutate(pop = Reduce(function(x, y){x * 1.1}, pop, accumulate = TRUE))

##   time    pop
## 1    1 50.000
## 2    2 55.000
## 3    3 60.500
## 4    4 66.550
## 5    5 73.205

or with purrr,

data.frame(time = 1:5, pop = 50) %>%
    mutate(pop = accumulate(pop, ~.x * 1.1))

##   time    pop
## 1    1 50.000
## 2    2 55.000
## 3    3 60.500
## 4    4 66.550
## 5    5 73.205

这篇关于在dplyr中模拟一个时间段,而不是使用for循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆