计算一个因素在滚动窗口中出现的次数 [英] count number of times a factor appears during rolling window

查看:103
本文介绍了计算一个因素在滚动窗口中出现的次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要生成列: PriorityCountInLast7Days。对于给定的雇员A,此列计算最近7天的案例数,其中PRIORITY与当前案例相同。我该如何在R中使用前4列?

I want to generate the column: "PriorityCountInLast7Days". For a given employee A, this column counts the number of CASES in the last 7 days where PRIORITY is the same as the current case. How would I do that in R with the first 4 columns?

data <- data.frame(
    Date = c("2018-06-01", "2018-06-03", "2018-06-03", "2018-06-03",  "2018-06-04", "2018-06-01", "2018-06-02", "2018-06-03"),
Emp1 = c("A","A","A","A","A","A","B","B","B"),
Case = c("A1", "A2", "A3", "A4", "A5", "A6", "B1", "B2", "B3"),
Priority = c(0,0,0,1,2,0,0,0,0),
PriorityCountinLast7days = c(0,1,2,1,1,3,1,2,3))

+------------+------+------+----------+--------------------------+
|    Date    | Emp1 | Case | Priority | PriorityCountinLast7days |
+------------+------+------+----------+--------------------------+
| 2018-06-01 | A    | A1   |        0 |                        0 |
| 2018-06-03 | A    | A2   |        0 |                        1 |
| 2018-06-03 | A    | A3   |        0 |                        2 |
| 2018-06-03 | A    | A4   |        1 |                        1 |
| 2018-06-03 | A    | A5   |        2 |                        1 |
| 2018-06-04 | A    | A6   |        0 |                        3 |
| 2018-06-01 | B    | B1   |        0 |                        1 |
| 2018-06-02 | B    | B2   |        0 |                        2 |
| 2018-06-03 | B    | B3   |        0 |                        3 |
+------------+------+------+----------+--------------------------+


推荐答案

您可以在整个数据集上使用迭代条件总和来完成此滚动窗口。这是什么意思?在for循环中,您可以检查当前日期> =要包含的日期,以及要包含> =的日期到7天前的日期,并且要包含的个案是==当前的个案。循环中的这种逻辑组合将为您创建此滚动过滤器。以下是一个函数:

You can accomplish this rolling window with an iterative conditional sum on your full dataset. What does this mean? Within a for loop you can check to see that your current date >= dates you want to include AND the dates you want to include >= to the date 7 days ago AND the cases you want to include are == to your current case. This logic combination in a loop will create this rolling filter for you. Here is a function:

rollPriority <- function(data, window = 7){
  stopifnot(all(c("Date","Case","Priority") %in% colnames(data))) # string error check
  data$Date <- as.Date(data$Date)
  for(i in 1:nrow(data)){
    #priorxdays <= dates we want <= current date
    datecheck <- (data$Date[i] - (window-1)) <= data$Date & data$Date <= data$Date[i]
    casecheck <- data$Case == data$Case[i]
    data$PriorityCountinLastXdays[i] = sum(data$Priority[which(datecheck & casecheck)])
  }
  Xdays <- which(colnames(data) == "PriorityCountinLastXdays")
  colnames(data)[Xdays] <- paste0("PriorityCountinLast", window, "days")
  return(data)
}

将来,请提供可重复输出的示例数据。您会注意到,我们仅看到4天的信息就无法满足您预期的7天滚动输出。一种快速的方法是使用 expand.grid()生成组合,并使用 set.seed()保存采样输出:

In the future, please provide example data with a reproducible output. You will notice that we cannot match your expected 7 day rolling output having only seen 4 days of information. A quick method here is to use expand.grid() to generate combinations, and set.seed() to preserve sampling output:

# Reproducible Example Data
dat <- expand.grid(Date = seq.Date(as.Date("2018-06-01"),
                                   as.Date("2018-06-4"), 
                                   by = "day"), 
                   Case = as.factor(sort(apply(expand.grid(c("A","B"),1:2), 
                                               1, 
                                               paste0, 
                                               collapse = ""))))
# Ensures random sampling is identical each time
set.seed(42); 
dat$Priority <- sample(0:1, nrow(dat), replace = T)

# The function
rollPriority(dat, 2)
#         Date Case Priority PriorityCountinLast2days
#1  2018-06-01   A1        1                        1
#2  2018-06-02   A1        1                        2
#3  2018-06-03   A1        0                        1
#4  2018-06-04   A1        1                        1
#5  2018-06-01   A2        1                        1
#6  2018-06-02   A2        1                        2
#7  2018-06-03   A2        1                        2
#8  2018-06-04   A2        0                        1
#9  2018-06-01   B1        1                        1
#10 2018-06-02   B1        1                        2
#11 2018-06-03   B1        0                        1
#12 2018-06-04   B1        1                        1
#13 2018-06-01   B2        1                        1
#14 2018-06-02   B2        0                        1
#15 2018-06-03   B2        0                        0
#16 2018-06-04   B2        1                        1

这样,某人更容易准确地为您提供帮助。

This way it is easier for someone to accurately assist you.

这篇关于计算一个因素在滚动窗口中出现的次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆