如何创建在特定条件下计算另一列的列?[R [英] How do I create columns which count another column with certain conditions? R
问题描述
下面,数据已经被重塑,并列出了输入和预期输出.
Below, the data has already been reshaped, and the inputs and expected output are listed.
数据
structure(list(record_id = c(110101, 110101, 110101, 110101,
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101,
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101,
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101,
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101,
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101,
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101,
110101, 110101, 110101, 110101, 110101, 110101, 110101, 110101
), start = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46,
47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59), stop = c(1,
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
52, 53, 54, 55, 56, 57, 58, 59, 60), `treatment (type)` = c(1,
1, 1, 0, 0, 0, 0, 2, 2, 2, 0, 0, 0, 0, 0, 0, 3, 3, 0, 3, 3, 3,
0, 2, 2, 2, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), n_interruption_periods = c(0,
0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3,
4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), n_interruption_periods_3days = c(0,
0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3), n_interruption_days_3days = c(0,
0, 0, 0, 0, 1, 2, 2, 2, 2, 2, 2, 3, 4, 5, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7)), row.names = c(NA,
-60L), class = c("tbl_df", "tbl", "data.frame"))
说明
输入开始
和停止
是天数.每日治疗列在治疗
中,其中0 =不治疗,这是中断,而1:3是治疗A/B/C.
Input
start
and stop
are the day-count. The daily treatment is listed in treatment
, with 0 = no treatment, which is an interruption, and 1:3 are treatment A/B/C.
输出根据治疗
列,我想每天进行计数:
Output
Based on the treatment
column, I want to count per day:
-
n_interruption_periods
:中断周期的总和/次数,与中断的持续时间无关 -
n_interruption_periods_3days
:总和/中断次数,条件是仅当持续时间大于等于3天时才应计数.少于3天的中断就没有意义了 -
n_interruption_days_3days
:中断天数的总和/数量,其中仅从中断的第3天开始计算中断次数.
n_interruption_periods
: the sum/number of interruption periods, irrespective of the duration of the interruptionn_interruption_periods_3days
: sum/the number of interruptions, with a condition that you should only count when the duration was >= 3 days. Interruptions shorter than 3 day are not of interestn_interruption_days_3days
: the cumulative sum/number of interruption days, where interruptions are only counted from day 3 of the interruption and on.
问题我想创建一个脚本,该脚本根据 treatment
变量自动计算上述输出变量.
Question
I want to create a script which calculates these abovementioned output variables automatically based on the treatment
variable.
希望可以为您提供帮助
体重
响应OP
以下是说明问题的数据的一部分:
Here is a part of the data which illustrate the problem:
structure(list(record_id = c(110001, 110002, 110002, 110002,
110001), day_count = c(732, 0, 1, 2, 733), day_count_stop = c(733,
1, 2, 3, 734), oac_class = c(0, 1, 1, 1, 1), n_interruption_periods = c(1,
1, 0, 0, 1), n_interruption_periods_3days = c(1, 1, 0, 0, 1)), row.names = c(NA,
-5L), groups = structure(list(record_id = c(110001, 110002),
.rows = structure(list(c(1L, 5L), 2:4), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
使用建议的代码,有两个问题:
With the suggested code, there are two issues:
-
我相信所得的向量没有分配给正确的位置.在这里,您可以看到从110001个结果扩展了
n_interruption_periods
和n_interruption_periods_3days
上的110002个第一数据.
I believe the resultant vector was not assigned to the correct position. Here you can see that 110002 first data on
n_interruption_periods
andn_interruption_periods_3days
are extended from 110001 results.
当我尝试运行第三个向量时,出现以下错误:while(any(d!= 0))中的错误{:在需要TRUE/FALSE的地方缺少值
When I try to run the third vector, I receive this error: Error in while (any(d != 0)) { : missing value where TRUE/FALSE needed
体重
推荐答案
我认为我们可以修改您在上一个帖子中编写的功能解决您所有的问题.考虑以下功能.
I think we can modify the function I wrote in your last post to solve all your problems. Consider the following function.
conditional_count <- function(x, n, pfill = function(p0) integer(length(p0)), ifill = seq_along, iend = 30L) {
len <- length(x); out <- integer(len)
p0 <- which(x == 0L)
if (n > 1L)
p0 <- Reduce(function(idx, i) {
lidx <- idx - i + 1L
idx <- idx[lidx > 0L]; lidx <- lidx[lidx > 0L]
idx[x[lidx] == 0L]
}, seq_len(n)[-1L], p0)
if (length(p0) < 1L)
return(out)
ub <- pmin(c(tail(p0, -1L), len), p0 + iend - 1L)
rl <- ub - p0 + 1L
pfill <- pfill(p0)
res <- unlist(lapply(seq_along(rl), function(i) ifill(integer(rl[[i]])) + pfill[[i]]))
pos <- inverse.rle(list(lengths = rl, values = p0)) + unlist(lapply(rl, seq_len)) - 1L
`[<-`(out, pos, res)
}
让p0为向量,其中包含所有已标识的有效中断的位置. pfill
和 ifill
是 conditional_count
的两个控制功能. pfill
控制如何填充向量p0中的每个位置; ifill
控制如何填充两个有效中断之间的间隔.两个有效中断之间的最终顺序为 ifill + pfill
. iend
控制最终序列的所需长度.参见下图(x是治疗(类型))
Let p0 be the vector that contains the positions of all valid interruptions identified. pfill
and ifill
are two control functions for conditional_count
. pfill
controls how to fill in each position in the vector p0; ifill
controls how to fill in the gap between two valid interruptions. The final sequence in-between two valid interruptions will be ifill + pfill
. iend
controls the desired length of the final sequence. See below the illustration (x is treatment (type))
ifill controls the numbers at *: ***** ***** (iend = 5L for example)
pfill controls the numbers at ?: ? ?
p0 identifies : v v
x looks like : 1 2 0 ... 0 ...
以n = 1为例,您的最后一个问题简化为
Using n = 1 as an example, your last problem simplifies to
conditional_count(x, 1L, function(p0) integer(length(p0)), seq_along, 30L)
ifill + pfill : 1 2 3 4 ... 1 2 3 4 ...
ifill is a sequence along the gap positions: 1 2 3 4 ... 1 2 3 4 ...
pfill is always 0 at all positions of p0 : 0 0
p0 identifies : v v
x looks like : 1 2 0 ........ 0
此问题简化为
conditional_count(x, 1L, function(p0) cumsum(p0 - head(c(-1L, p0), -1L) > 1L), function(x) integer(length(x)), Inf)
ifill + pfill : 1 1 1 ... 2 2 ...
ifill is always 0 along the gap positions : 0 0 0 ... 0 0 ... (iend = Inf means filling in a sequence until the end of the gap)
pfill increases 1 at each starting streak of 0s: 1 2
p0 identifies : v v v v v
x looks like : 1 2 0 0 0 ....... 0 0 ...
conditional_count(x, 1L, seq_along, function(x) integer(length(x)), Inf)
ifill + pfill : 1 2 3 ... 4 5 ...
ifill is always 0 along the gap positions: 0 0 0 ... 0 0 ... (iend = Inf means filling in a sequence until the end of the gap)
pfill increases 1 at each 0 : 1 2 3 4 5
p0 identifies : v v v v v
x looks like : 1 2 0 0 0 ....... 0 0 ...
此问题的完整脚本是
conditional_count <- function(x, n, pfill = function(p0) integer(length(p0)), ifill = seq_along, iend = 30L) {
len <- length(x); out <- integer(len)
p0 <- which(x == 0L)
if (n > 1L)
p0 <- Reduce(function(idx, i) {
lidx <- idx - i + 1L
idx <- idx[lidx > 0L]; lidx <- lidx[lidx > 0L]
idx[x[lidx] == 0L]
}, seq_len(n)[-1L], p0)
if (length(p0) < 1L)
return(out)
ub <- pmin(c(tail(p0, -1L), len), p0 + iend - 1L)
rl <- ub - p0 + 1L
pfill <- pfill(p0)
res <- unlist(lapply(seq_along(rl), function(i) ifill(integer(rl[[i]])) + pfill[[i]]))
pos <- inverse.rle(list(lengths = rl, values = p0)) + unlist(lapply(rl, seq_len)) - 1L
`[<-`(out, pos, res)
}
count_streak <- function(p0) cumsum(p0 - head(c(-1L, p0), -1L) > 1L)
integer_along <- function(x) integer(length(x))
df %>%
mutate(
n_interruption_periods = conditional_count(`treatment (type)`, 1L, count_streak, integer_along, Inf),
n_interruption_periods_3days = conditional_count(`treatment (type)`, 3L, count_streak, integer_along, Inf),
n_interruption_days_3days = conditional_count(`treatment (type)`, 3L, seq_along, integer_along, Inf)
)
输出
record_id start stop treatment (type) n_interruption_periods n_interruption_periods_3days n_interruption_days_3days
1 110101 0 1 1 0 0 0
2 110101 1 2 1 0 0 0
3 110101 2 3 1 0 0 0
4 110101 3 4 0 1 0 0
5 110101 4 5 0 1 0 0
6 110101 5 6 0 1 1 1
7 110101 6 7 0 1 1 2
8 110101 7 8 2 1 1 2
9 110101 8 9 2 1 1 2
10 110101 9 10 2 1 1 2
11 110101 10 11 0 2 1 2
12 110101 11 12 0 2 1 2
13 110101 12 13 0 2 2 3
14 110101 13 14 0 2 2 4
15 110101 14 15 0 2 2 5
16 110101 15 16 0 2 2 6
17 110101 16 17 3 2 2 6
18 110101 17 18 3 2 2 6
19 110101 18 19 0 3 2 6
20 110101 19 20 3 3 2 6
21 110101 20 21 3 3 2 6
22 110101 21 22 3 3 2 6
23 110101 22 23 0 4 2 6
24 110101 23 24 2 4 2 6
25 110101 24 25 2 4 2 6
26 110101 25 26 2 4 2 6
27 110101 26 27 0 5 2 6
28 110101 27 28 0 5 2 6
29 110101 28 29 0 5 3 7
30 110101 29 30 1 5 3 7
31 110101 30 31 1 5 3 7
32 110101 31 32 1 5 3 7
33 110101 32 33 1 5 3 7
34 110101 33 34 1 5 3 7
35 110101 34 35 1 5 3 7
36 110101 35 36 1 5 3 7
37 110101 36 37 1 5 3 7
38 110101 37 38 1 5 3 7
39 110101 38 39 1 5 3 7
40 110101 39 40 1 5 3 7
41 110101 40 41 1 5 3 7
42 110101 41 42 1 5 3 7
43 110101 42 43 1 5 3 7
44 110101 43 44 1 5 3 7
45 110101 44 45 1 5 3 7
46 110101 45 46 1 5 3 7
47 110101 46 47 1 5 3 7
48 110101 47 48 1 5 3 7
49 110101 48 49 1 5 3 7
50 110101 49 50 1 5 3 7
51 110101 50 51 1 5 3 7
52 110101 51 52 1 5 3 7
53 110101 52 53 1 5 3 7
54 110101 53 54 1 5 3 7
55 110101 54 55 1 5 3 7
56 110101 55 56 1 5 3 7
57 110101 56 57 1 5 3 7
58 110101 57 58 1 5 3 7
59 110101 58 59 1 5 3 7
60 110101 59 60 1 5 3 7
这篇关于如何创建在特定条件下计算另一列的列?[R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!