在dplyr中的group_by中有条件地忽略值 [英] Ignore value conditionally within group_by in dplyr
问题描述
请考虑以下内容.
背景
Background
在data.frame
中,我有患者ID(id
),即患者入院的日期(day
),是他们当天接受的诊断活动的代码(code
),该活动的价格(price
)和该活动的频率(freq
).
In a data.frame
I have patient IDs (id
), the day at which patients are admitted to a hospital (day
), a code for the diagnostic activity they received that day (code
), a price for that activity (price
) and a frequency for that activity (freq
).
具有code
b
和c
的活动是同时注册的,但是或多或少是同一件事,因此不应重复计算.
Activities with code
b
and c
are registered at the same time but mean more or less the same thing and should not be double counted.
问题
Problem
我想要的是:如果code
"b"和"c"在同一天注册了,则应该忽略code
"b".
What I want is: if code
"b" and "c" are registered for the same day, code
"b" should be ignored.
示例data.frame
看起来像这样:
x <- data.frame(id = c(rep("a", 4), rep("b", 3)),
day = c(1, 1, 1, 2, 1, 2, 3),
price = c(500, 10, 100, rep(10, 3), 100),
code = c("a", "b", "c", rep("b", 3), "c"),
freq = c(rep(1, 5), rep(2, 2))))
> x
id day price code freq
1 a 1 500 a 1
2 a 1 10 b 1
3 a 1 100 c 1
4 a 2 10 b 1
5 b 1 10 b 1
6 b 2 10 b 2
7 b 3 100 c 2
因此,根据我的计算,第1天患者"a"的费用为600,而不是610:
So the costs for patient "a" for day 1 would be 600 and not 610 as I can compute with the following:
x %>%
group_by(id, day) %>%
summarise(res = sum(price * freq))
# A tibble: 5 x 3
# Groups: id [?]
id day res
<fct> <dbl> <dbl>
1 a 1. 610.
2 a 2. 10.
3 b 1. 10.
4 b 2. 20.
5 b 3. 200.
可能的方法
Possible approaches
当同一天出现"c"时,我要么删除观察结果"code
""b",要么在code
"c"出现的情况下,将code
"b"的freq
设置为0.同一天.
Either I delete observation code
"b" when "c" is present on that same day or I set freq
of code
"b" to 0 in case code
"c" is present on the same day.
到目前为止,我对ifelse
和mutate
的所有尝试都失败了.
All my attempts with ifelse
and mutate
failed so far.
我们非常感谢您的帮助.提前非常感谢您!
Every help is much appreciated. Thank you very much in advance!
推荐答案
您可以添加filter
行以删除此类有问题的b
值...
You can add a filter
line to remove the offending b
values like this...
x %>%
group_by(id, day) %>%
filter(!(code=="b" & "c" %in% code)) %>%
summarise(res = sum(price * freq))
id day res
<fct> <dbl> <dbl>
1 a 1. 600.
2 a 2. 10.
3 b 1. 10.
4 b 2. 20.
5 b 3. 200.
这篇关于在dplyr中的group_by中有条件地忽略值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!