具有日期的数据框子集的Ifelse语句 [英] Ifelse statement with dataframe subset using date
问题描述
我正在尝试创建一个函数以应用于数据帧中的变量,该变量要从当前观测值向前2天的窗口中更改VarD的值(如果在该日期窗口中始终取值1)。 / p>
数据帧如下所示:
VarA VarB Date Diff VarD
1 1 2007-04-09不适用0
1 1 2007-04-10 0 0
1 1 2007-04-11 -2 1
1 1 2007-04-12 0 1
1 1 2007-04-13 2 0
1 1 2007-04-14 0 0
1 1 2007-04-15 -2 1
1 1 2007- 04-16 1 0
1 1 2007-04-17 -4 1
1 1 2007-04-18 0 1
1 1 2007-04-19 0 1
1 1 2007-04-20 0 1
新数据框应如下所示:
VarA VarB日期差异VarD VarC
1 1 2007-04-09 NA 0 0
1 1 2007-04-10 0 0 0
1 1 2007-04-11 -2 1 1
1 1 2007-04-12 0 1 1
1 1 2007-04-13 2 0 0
1 1 2007 -04-14 0 0 0
1 1 2007-04-15 -2 1 1
1 1 2007-04-16 1 0 0
1 1 2007-04-17 -4 1 0
1 1 2007-04-18 0 1 0
1 1 2007-04-19 0 1 0
1 1 2007-04-20 0 1 0
我尝试了以下代码:
db $ VarC<-0
对于(i在唯一(db $ VarA)中){
对于(j在唯一(db $ VarB)中){
for(n in 1:lenght(db $ Date)){
if(db $ VarD [n] == 0){db $ VarC [n]<-0}
else {db $ VarC [n]<-ifelse(0%in%db [(db $ Date> = n& db $ Date< n + 3,] $ VarC,1,0}
}
}
但是我在VarC中只得到零。我已经检查了没有其他代码,它工作正常。如果运行了完整的代码,则r没有错误。我不知道问题可能在哪里。
以下是一些替代方案。第一个避免了一些混乱的索引,但是后两个不需要任何软件包。
1)rollapply 这将应用 VarC
以滚动方式作用于 db $ VarD
的每三个元素。 align = left
表示,当它传递 x
来运行 VarC $ c时$ c>表示
x [1]
是当前元素, x [2]
接下来是 x [3]
下一个,即当前元素在最左边。 partial = TRUE
表示,如果没有3个可用元素(最后一个元素和最后一个元素将是这种情况),则只需通过,但还有许多剩余。 / p>
库(zoo)
VarC <-函数(x)if(all(x [- 1] == 1))0 else x [1]
db $ VarC<-rollapply(db $ VarD,3,VarC,部分= TRUE,对齐=左)
给予:
> db
VarA VarB日期差异VarD VarC
1 1 1 2007-04-09 NA 0 0
2 1 1 2007-04-10 0 0 0
3 1 1 2007- 04-11 -2 1 1
4 1 1 2007-04-12 0 1 1
5 1 1 2007-04-13 2 0 0
6 1 1 2007-04-14 0 0 0
7 1 1 2007-04-15 -2 1 1
8 1 1 2007-04-16 1 0 0
9 1 1 2007-04-17 -4 1 0
10 1 1 2007-04-18 0 1 0
11 1 1 2007-04-19 0 1 0
12 1 1 2007-04-20 0 1 0
2)申请或使用 VarC
上面的
n<-nrow(db)
db $ VarC<-sapply(1:n,函数(i)VarC(db $ VarD [i:min(i + 2,n)]))
3),或从上方使用 n
和 VarC
:
db $ VarC<-NA
for(i in 1:n)db $ VarC [i]<-VarC( db $ Va rD [i:min(i + 2,n)])
注意:可重复形式的输入 db
是:
Lines< - VarA VarB Date Diff VarD VarC
1 1 2007-04-09 NA 0 0
1 1 2007-04-10 0 0 0
1 1 2007-04-11 -2 1 1
1 1 2007-04-12 0 1 1
1 1 2007-04-13 2 0 0
1 1 2007-04-14 0 0 0
1 1 2007 -04-15 -2 1 1
1 1 2007-04-16 1 0 0
1 1 2007-04-17 -4 1 0
1 1 2007-04-18 0 1 0
1 1 2007-04-19 0 1 0
1 1 2007-04-20 0 1 0
db<-read.table(text = Lines,header = TRUE)
I am trying to create a function to apply to a variable in a dataframe that, for a windows of 2 days forward from the current observation, change the value of VarD if in that date window it always take the value 1.
The dataframe looks like this:
VarA VarB Date Diff VarD
1 1 2007-04-09 NA 0
1 1 2007-04-10 0 0
1 1 2007-04-11 -2 1
1 1 2007-04-12 0 1
1 1 2007-04-13 2 0
1 1 2007-04-14 0 0
1 1 2007-04-15 -2 1
1 1 2007-04-16 1 0
1 1 2007-04-17 -4 1
1 1 2007-04-18 0 1
1 1 2007-04-19 0 1
1 1 2007-04-20 0 1
The new dataframe should look like the following:
VarA VarB Date Diff VarD VarC
1 1 2007-04-09 NA 0 0
1 1 2007-04-10 0 0 0
1 1 2007-04-11 -2 1 1
1 1 2007-04-12 0 1 1
1 1 2007-04-13 2 0 0
1 1 2007-04-14 0 0 0
1 1 2007-04-15 -2 1 1
1 1 2007-04-16 1 0 0
1 1 2007-04-17 -4 1 0
1 1 2007-04-18 0 1 0
1 1 2007-04-19 0 1 0
1 1 2007-04-20 0 1 0
I have tried the following code:
db$VarC <- 0
for (i in unique(db$VarA)) {
for (j in unique(db$VarB)) {
for (n in 1 : lenght(db$Date)) {
if (db$VarD[n] == 0) {db$VarC[n] <- 0}
else { db$VarC[n] <- ifelse(0 %in% db[(db$Date >=n & db$Date < n+3,]$VarC, 1,0}
}
}
But I obtain just zeroes in VarC. I have checked the code without the else and it works fine. No error by r if the complete code is run. I do not have any clue on where the problem could be.
Here are some alternatives. The first one avoids some messy indexing but the last two do not require any packages.
1) rollapply This applies the VarC
function in a rolling fashion to each 3 elements of db$VarD
. align = "left"
says that when it passes x
to function VarC
that x[1]
is the current element, x[2]
the next and x[3]
the next, i.e. the current element is the leftmost. partial = TRUE
says that if there are not 3 elements available (which would be the case for the last and next to last elements) then just pass however many there are remaining.
library(zoo)
VarC <- function(x) if (all(x[-1] == 1)) 0 else x[1]
db$VarC <- rollapply(db$VarD, 3, VarC, partial = TRUE, align = "left")
giving:
> db
VarA VarB Date Diff VarD VarC
1 1 1 2007-04-09 NA 0 0
2 1 1 2007-04-10 0 0 0
3 1 1 2007-04-11 -2 1 1
4 1 1 2007-04-12 0 1 1
5 1 1 2007-04-13 2 0 0
6 1 1 2007-04-14 0 0 0
7 1 1 2007-04-15 -2 1 1
8 1 1 2007-04-16 1 0 0
9 1 1 2007-04-17 -4 1 0
10 1 1 2007-04-18 0 1 0
11 1 1 2007-04-19 0 1 0
12 1 1 2007-04-20 0 1 0
2) sapply or using VarC
from above:
n <- nrow(db)
db$VarC <- sapply(1:n, function(i) VarC(db$VarD[i:min(i+2, n)]))
3) for or using n
and VarC
from above:
db$VarC <- NA
for(i in 1:n) db$VarC[i] <- VarC(db$VarD[i:min(i+2, n)])
Note: The input db
in reproducible form is:
Lines <- "VarA VarB Date Diff VarD VarC
1 1 2007-04-09 NA 0 0
1 1 2007-04-10 0 0 0
1 1 2007-04-11 -2 1 1
1 1 2007-04-12 0 1 1
1 1 2007-04-13 2 0 0
1 1 2007-04-14 0 0 0
1 1 2007-04-15 -2 1 1
1 1 2007-04-16 1 0 0
1 1 2007-04-17 -4 1 0
1 1 2007-04-18 0 1 0
1 1 2007-04-19 0 1 0
1 1 2007-04-20 0 1 0 "
db <- read.table(text = Lines, header = TRUE)
这篇关于具有日期的数据框子集的Ifelse语句的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!