如何在R中创建条件假设? [英] How to create a conditional dummy in R?
问题描述
df< - data.frame(date = seq(365),
temp = -30 + 0.65 * seq(365) - 0.0018 * seq(365)^ 2 + rnorm(365))
我想我已经完成了,但是我的喜好太多了。这就是我所做的:
df $ dummyUnconditional< - 0
df $ dummyHead< - 0
df $ dummyTail< - 0
for(i in 1:nrow(df)){
if(df $ temp [i]> 5){
df $ dummyUnconditional [i]< - 1
}
}
(i in 1:(nrow(df)-9)){
if( sum(df $ dummyUnconditional [i:(i + 9)])== 10){
df $ dummyHead [i]< - 1
}
}
for(i in 9:nrow(df)){
if(sum(df $ dummyUnconditional [(i-9):i])== 10){
df $ dummyTail [i] < - 1
}
}
df $ dummyConditional< - ifelse(df $ dummyHead == 1 | df $ dummyTail == 1,1,0)
任何人都可以提供更简单的方法来执行此操作?
这是一个基础R选项,使用 rle
:
- (rle(df $ temp> 5),rep(as.integer(values& length> = 10),length))
一些解释:该任务是运行长度编码( rle
)函数imo的经典用例。我们首先检查 temp
的值是否大于5(创建逻辑向量),并在该向量上应用 rle
导致:
> rle(df $ temp> 5)
#Run长度编码
#length:int [1:7] 66 1 1 225 2 1 69
#值:logi [1:7] FALSE TRUE FALSE TRUE FALSE TRUE ...
现在我们要查找那些值
是 TRUE
(即temp大于5),同时长度
大于10(即连续至少十个
temp
值大于5)。我们通过执行以下操作来执行此操作:
values&长度> = 10
最后,因为我们要返回一个与 nrow(df)
,我们使用 rep(...,length)
和 as.integer
为了返回1/0而不是 TRUE
/ FALSE
。
I have a dataframe of time series data with daily observations of temperatures. I need to create a dummy variable that counts each day that has temperature above a threshold of 5C. This would be easy in itself, but an additional condition exists: counting starts only after ten consecutive days above the threshold occurs. Here's an example dataframe:
df <- data.frame(date = seq(365),
temp = -30 + 0.65*seq(365) - 0.0018*seq(365)^2 + rnorm(365))
I think I got it done, but with too many loops for my liking. This is what I did:
df$dummyUnconditional <- 0
df$dummyHead <- 0
df$dummyTail <- 0
for(i in 1:nrow(df)){
if(df$temp[i] > 5){
df$dummyUnconditional[i] <- 1
}
}
for(i in 1:(nrow(df)-9)){
if(sum(df$dummyUnconditional[i:(i+9)]) == 10){
df$dummyHead[i] <- 1
}
}
for(i in 9:nrow(df)){
if(sum(df$dummyUnconditional[(i-9):i]) == 10){
df$dummyTail[i] <- 1
}
}
df$dummyConditional <- ifelse(df$dummyHead == 1 | df$dummyTail == 1, 1, 0)
Could anyone suggest simpler ways for doing this?
Here's a base R option using rle
:
df$dummy <- with(rle(df$temp > 5), rep(as.integer(values & lengths >= 10), lengths))
Some explanation: The task is a classic use case for the run length encoding (rle
) function, imo. We first check if the value of temp
is greater than 5 (creating a logical vector) and apply rle
on that vector resulting in:
> rle(df$temp > 5)
#Run Length Encoding
# lengths: int [1:7] 66 1 1 225 2 1 69
# values : logi [1:7] FALSE TRUE FALSE TRUE FALSE TRUE ...
Now we want to find those cases where the values
is TRUE
(i.e. temp is greater than 5) and where at the same time the lengths
is greater than 10 (i.e. at least ten consecutive temp
values are greater than 5). We do this by running:
values & lengths >= 10
And finally, since we want to return a vector of the same lengths as nrow(df)
, we use rep(..., lengths)
and as.integer
in order to return 1/0 instead of TRUE
/FALSE
.
这篇关于如何在R中创建条件假设?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!