如何创建在特定条件下计算另一列的列?[R [英] How do I create columns which count another column with certain conditions? R

查看:48
本文介绍了如何创建在特定条件下计算另一列的列?[R的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面,数据已经被重塑,并列出了输入和预期输出.

Below, the data has already been reshaped, and the inputs and expected output are listed.

数据

structure(list(record_id = c(110101, 110101, 110101, 110101, 
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101, 
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101
), start = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 
31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 
47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59), stop = c(1, 
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 
52, 53, 54, 55, 56, 57, 58, 59, 60), `treatment (type)` = c(1, 
1, 1, 0, 0, 0, 0, 2, 2, 2, 0, 0, 0, 0, 0, 0, 3, 3, 0, 3, 3, 3, 
0, 2, 2, 2, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), n_interruption_periods = c(0, 
0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 
4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), n_interruption_periods_3days = c(0, 
0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3), n_interruption_days_3days = c(0, 
0, 0, 0, 0, 1, 2, 2, 2, 2, 2, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6, 
6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7)), row.names = c(NA, 
-60L), class = c("tbl_df", "tbl", "data.frame"))

说明

输入开始停止是天数.每日治疗列在治疗中,其中0 =不治疗,这是中断,而1:3是治疗A/B/C.

Input start and stop are the day-count. The daily treatment is listed in treatment, with 0 = no treatment, which is an interruption, and 1:3 are treatment A/B/C.

输出根据治疗列,我想每天进行计数:

Output Based on the treatment column, I want to count per day:

  1. n_interruption_periods :中断周期的总和/次数,与中断的持续时间无关
  2. n_interruption_periods_3days :总和/中断次数,条件是仅当持续时间大于等于3天时才应计数.少于3天的中断就没有意义了
  3. n_interruption_days_3days :中断天数的总和/数量,其中仅从中断的第3天开始计算中断次数.
  1. n_interruption_periods: the sum/number of interruption periods, irrespective of the duration of the interruption
  2. n_interruption_periods_3days: sum/the number of interruptions, with a condition that you should only count when the duration was >= 3 days. Interruptions shorter than 3 day are not of interest
  3. n_interruption_days_3days: the cumulative sum/number of interruption days, where interruptions are only counted from day 3 of the interruption and on.

问题我想创建一个脚本,该脚本根据 treatment 变量自动计算上述输出变量.

Question I want to create a script which calculates these abovementioned output variables automatically based on the treatment variable.

希望可以为您提供帮助

体重

响应OP

以下是说明问题的数据的一部分:

Here is a part of the data which illustrate the problem:

structure(list(record_id = c(110001, 110002, 110002, 110002, 
110001), day_count = c(732, 0, 1, 2, 733), day_count_stop = c(733, 
1, 2, 3, 734), oac_class = c(0, 1, 1, 1, 1), n_interruption_periods = c(1, 
1, 0, 0, 1), n_interruption_periods_3days = c(1, 1, 0, 0, 1)), row.names = c(NA, 
-5L), groups = structure(list(record_id = c(110001, 110002), 
    .rows = structure(list(c(1L, 5L), 2:4), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, -2L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

使用建议的代码,有两个问题:

With the suggested code, there are two issues:

  1. 我相信所得的向量没有分配给正确的位置.在这里,您可以看到从110001个结果扩展了 n_interruption_periods n_interruption_periods_3days 上的110002个第一数据.

  1. I believe the resultant vector was not assigned to the correct position. Here you can see that 110002 first data on n_interruption_periods and n_interruption_periods_3days are extended from 110001 results.

当我尝试运行第三个向量时,出现以下错误:while(any(d!= 0))中的错误{:在需要TRUE/FALSE的地方缺少值

When I try to run the third vector, I receive this error: Error in while (any(d != 0)) { : missing value where TRUE/FALSE needed

体重

推荐答案

我认为我们可以修改您在上一个帖子中编写的功能解决您所有的问题.考虑以下功能.

I think we can modify the function I wrote in your last post to solve all your problems. Consider the following function.

conditional_count <- function(x, n, pfill = function(p0) integer(length(p0)), ifill = seq_along, iend = 30L) {
  len <- length(x); out <- integer(len)
  p0 <- which(x == 0L)
  if (n > 1L)
    p0 <- Reduce(function(idx, i) {
      lidx <- idx - i + 1L
      idx <- idx[lidx > 0L]; lidx <- lidx[lidx > 0L]
      idx[x[lidx] == 0L]
    }, seq_len(n)[-1L], p0)
  if (length(p0) < 1L)
    return(out)
  ub <- pmin(c(tail(p0, -1L), len), p0 + iend - 1L)
  rl <- ub - p0 + 1L
  pfill <- pfill(p0)
  res <- unlist(lapply(seq_along(rl), function(i) ifill(integer(rl[[i]])) + pfill[[i]]))
  pos <- inverse.rle(list(lengths = rl, values = p0)) + unlist(lapply(rl, seq_len)) - 1L
  `[<-`(out, pos, res)
}

让p0为向量,其中包含所有已标识的有效中断的位置. pfill ifill conditional_count 的两个控制功能. pfill 控制如何填充向量p0中的每个位置; ifill 控制如何填充两个有效中断之间的间隔.两个有效中断之间的最终顺序为 ifill + pfill . iend 控制最终序列的所需长度.参见下图(x是治疗(类型))

Let p0 be the vector that contains the positions of all valid interruptions identified. pfill and ifill are two control functions for conditional_count. pfill controls how to fill in each position in the vector p0; ifill controls how to fill in the gap between two valid interruptions. The final sequence in-between two valid interruptions will be ifill + pfill. iend controls the desired length of the final sequence. See below the illustration (x is treatment (type))

ifill controls the numbers  at *:             *****   *****   (iend = 5L for example)
pfill controls the numbers  at ?:             ?       ?
p0 identifies                   :             v       v
x looks like                    :   1    2    0  ...  0  ... 

以n = 1为例,您的最后一个问题简化为

Using n = 1 as an example, your last problem simplifies to

conditional_count(x, 1L, function(p0) integer(length(p0)), seq_along, 30L)

ifill + pfill                              :       1 2 3 4 ...  1 2 3 4 ...
ifill is a sequence along the gap positions:       1 2 3 4 ...  1 2 3 4 ...
pfill is always 0 at all positions of p0   :       0            0       
p0 identifies                              :       v            v       
x looks like                               :   1 2 0 ........   0       

此问题简化为

conditional_count(x, 1L, function(p0) cumsum(p0 - head(c(-1L, p0), -1L) > 1L), function(x) integer(length(x)), Inf)

ifill + pfill                                  :       1 1 1 ...     2 2 ...
ifill is always 0 along the gap positions      :       0 0 0 ...     0 0 ...  (iend = Inf means filling in a sequence until the end of the gap)
pfill increases 1 at each starting streak of 0s:       1             2
p0 identifies                                  :       v v v         v v
x looks like                                   :   1 2 0 0 0 ....... 0 0 ...

conditional_count(x, 1L, seq_along, function(x) integer(length(x)), Inf)

ifill + pfill                            :       1 2 3 ...     4 5 ...
ifill is always 0 along the gap positions:       0 0 0 ...     0 0 ...  (iend = Inf means filling in a sequence until the end of the gap)
pfill increases 1 at each 0              :       1 2 3         4 5
p0 identifies                            :       v v v         v v
x looks like                             :   1 2 0 0 0 ....... 0 0 ...

此问题的完整脚本是

conditional_count <- function(x, n, pfill = function(p0) integer(length(p0)), ifill = seq_along, iend = 30L) {
  len <- length(x); out <- integer(len)
  p0 <- which(x == 0L)
  if (n > 1L)
    p0 <- Reduce(function(idx, i) {
      lidx <- idx - i + 1L
      idx <- idx[lidx > 0L]; lidx <- lidx[lidx > 0L]
      idx[x[lidx] == 0L]
    }, seq_len(n)[-1L], p0)
  if (length(p0) < 1L)
    return(out)
  ub <- pmin(c(tail(p0, -1L), len), p0 + iend - 1L)
  rl <- ub - p0 + 1L
  pfill <- pfill(p0)
  res <- unlist(lapply(seq_along(rl), function(i) ifill(integer(rl[[i]])) + pfill[[i]]))
  pos <- inverse.rle(list(lengths = rl, values = p0)) + unlist(lapply(rl, seq_len)) - 1L
  `[<-`(out, pos, res)
}

count_streak <- function(p0) cumsum(p0 - head(c(-1L, p0), -1L) > 1L)
integer_along <- function(x) integer(length(x))

df %>%
  mutate(
    n_interruption_periods = conditional_count(`treatment (type)`, 1L, count_streak, integer_along, Inf),
    n_interruption_periods_3days = conditional_count(`treatment (type)`, 3L, count_streak, integer_along, Inf),
    n_interruption_days_3days = conditional_count(`treatment (type)`, 3L, seq_along, integer_along, Inf)
  )

输出

   record_id start stop treatment (type) n_interruption_periods n_interruption_periods_3days n_interruption_days_3days
1     110101     0    1                1                      0                            0                         0
2     110101     1    2                1                      0                            0                         0
3     110101     2    3                1                      0                            0                         0
4     110101     3    4                0                      1                            0                         0
5     110101     4    5                0                      1                            0                         0
6     110101     5    6                0                      1                            1                         1
7     110101     6    7                0                      1                            1                         2
8     110101     7    8                2                      1                            1                         2
9     110101     8    9                2                      1                            1                         2
10    110101     9   10                2                      1                            1                         2
11    110101    10   11                0                      2                            1                         2
12    110101    11   12                0                      2                            1                         2
13    110101    12   13                0                      2                            2                         3
14    110101    13   14                0                      2                            2                         4
15    110101    14   15                0                      2                            2                         5
16    110101    15   16                0                      2                            2                         6
17    110101    16   17                3                      2                            2                         6
18    110101    17   18                3                      2                            2                         6
19    110101    18   19                0                      3                            2                         6
20    110101    19   20                3                      3                            2                         6
21    110101    20   21                3                      3                            2                         6
22    110101    21   22                3                      3                            2                         6
23    110101    22   23                0                      4                            2                         6
24    110101    23   24                2                      4                            2                         6
25    110101    24   25                2                      4                            2                         6
26    110101    25   26                2                      4                            2                         6
27    110101    26   27                0                      5                            2                         6
28    110101    27   28                0                      5                            2                         6
29    110101    28   29                0                      5                            3                         7
30    110101    29   30                1                      5                            3                         7
31    110101    30   31                1                      5                            3                         7
32    110101    31   32                1                      5                            3                         7
33    110101    32   33                1                      5                            3                         7
34    110101    33   34                1                      5                            3                         7
35    110101    34   35                1                      5                            3                         7
36    110101    35   36                1                      5                            3                         7
37    110101    36   37                1                      5                            3                         7
38    110101    37   38                1                      5                            3                         7
39    110101    38   39                1                      5                            3                         7
40    110101    39   40                1                      5                            3                         7
41    110101    40   41                1                      5                            3                         7
42    110101    41   42                1                      5                            3                         7
43    110101    42   43                1                      5                            3                         7
44    110101    43   44                1                      5                            3                         7
45    110101    44   45                1                      5                            3                         7
46    110101    45   46                1                      5                            3                         7
47    110101    46   47                1                      5                            3                         7
48    110101    47   48                1                      5                            3                         7
49    110101    48   49                1                      5                            3                         7
50    110101    49   50                1                      5                            3                         7
51    110101    50   51                1                      5                            3                         7
52    110101    51   52                1                      5                            3                         7
53    110101    52   53                1                      5                            3                         7
54    110101    53   54                1                      5                            3                         7
55    110101    54   55                1                      5                            3                         7
56    110101    55   56                1                      5                            3                         7
57    110101    56   57                1                      5                            3                         7
58    110101    57   58                1                      5                            3                         7
59    110101    58   59                1                      5                            3                         7
60    110101    59   60                1                      5                            3                         7

这篇关于如何创建在特定条件下计算另一列的列?[R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆