如何在R中创建条件假设? [英] How to create a conditional dummy in R?

查看:138
本文介绍了如何在R中创建条件假设?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个时间序列数据的数据框,每天观测温度。我需要创建一个虚拟变量,计算每天的温度高于5℃的阈值。这本身就容易了,但还有一个条件:只有在连续十天以上出现阈值才开始计数。以下是一个示例数据框:

  df<  -  data.frame(date = seq(365),
temp = -30 + 0.65 * seq(365) - 0.0018 * seq(365)^ 2 + rnorm(365))



我想我已经完成了,但是我的喜好太多了。这就是我所做的:

  df $ dummyUnconditional<  -  0 
df $ dummyHead< - 0
df $ dummyTail< - 0

for(i in 1:nrow(df)){
if(df $ temp [i]> 5){
df $ dummyUnconditional [i]< - 1
}
}

(i in 1:(nrow(df)-9)){
if( sum(df $ dummyUnconditional [i:(i + 9)])== 10){
df $ dummyHead [i]< - 1
}
}

for(i in 9:nrow(df)){
if(sum(df $ dummyUnconditional [(i-9):i])== 10){
df $ dummyTail [i] < - 1
}
}

df $ dummyConditional< - ifelse(df $ dummyHead == 1 | df $ dummyTail == 1,1,0)

任何人都可以提供更简单的方法来执行此操作?

解决方案

这是一个基础R选项,使用 rle

 - (rle(df $ temp> 5),rep(as.integer(values& length> = 10),length))






一些解释:该任务是运行长度编码( rle )函数imo的经典用例。我们首先检查 temp 的值是否大于5(创建逻辑向量),并在该向量上应用 rle 导致:

 > rle(df $ temp> 5)
#Run长度编码
#length:int [1:7] 66 1 1 225 2 1 69
#值:logi [1:7] FALSE TRUE FALSE TRUE FALSE TRUE ...

现在我们要查找那些 TRUE (即temp大于5),同时长度大于10(即连续至少十个 temp 值大于5)。我们通过执行以下操作来执行此操作:

  values&长度> = 10 

最后,因为我们要返回一个与 nrow(df),我们使用 rep(...,length) as.integer 为了返回1/0而不是 TRUE / FALSE


I have a dataframe of time series data with daily observations of temperatures. I need to create a dummy variable that counts each day that has temperature above a threshold of 5C. This would be easy in itself, but an additional condition exists: counting starts only after ten consecutive days above the threshold occurs. Here's an example dataframe:

df <- data.frame(date = seq(365), 
         temp = -30 + 0.65*seq(365) - 0.0018*seq(365)^2 + rnorm(365))

I think I got it done, but with too many loops for my liking. This is what I did:

df$dummyUnconditional <- 0
df$dummyHead <- 0
df$dummyTail <- 0

for(i in 1:nrow(df)){
    if(df$temp[i] > 5){
        df$dummyUnconditional[i] <- 1
    }
}

for(i in 1:(nrow(df)-9)){
    if(sum(df$dummyUnconditional[i:(i+9)]) == 10){
        df$dummyHead[i] <- 1
    }
}

for(i in 9:nrow(df)){
    if(sum(df$dummyUnconditional[(i-9):i]) == 10){
        df$dummyTail[i] <- 1
    }
}

df$dummyConditional <- ifelse(df$dummyHead == 1 | df$dummyTail == 1, 1, 0)

Could anyone suggest simpler ways for doing this?

解决方案

Here's a base R option using rle:

df$dummy <- with(rle(df$temp > 5), rep(as.integer(values & lengths >= 10), lengths))


Some explanation: The task is a classic use case for the run length encoding (rle) function, imo. We first check if the value of temp is greater than 5 (creating a logical vector) and apply rle on that vector resulting in:

> rle(df$temp > 5)
#Run Length Encoding
#  lengths: int [1:7] 66 1 1 225 2 1 69
#  values : logi [1:7] FALSE TRUE FALSE TRUE FALSE TRUE ...

Now we want to find those cases where the values is TRUE (i.e. temp is greater than 5) and where at the same time the lengths is greater than 10 (i.e. at least ten consecutive tempvalues are greater than 5). We do this by running:

values & lengths >= 10

And finally, since we want to return a vector of the same lengths as nrow(df), we use rep(..., lengths) and as.integer in order to return 1/0 instead of TRUE/FALSE.

这篇关于如何在R中创建条件假设?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆