在dplyr中的group_by中有条件地忽略值 [英] Ignore value conditionally within group_by in dplyr

查看:159
本文介绍了在dplyr中的group_by中有条件地忽略值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请考虑以下内容.

背景

Background

data.frame中,我有患者ID(id),即患者入院的日期(day),是他们当天接受的诊断活动的代码(code),该活动的价格(price)和该活动的频率(freq).

In a data.frame I have patient IDs (id), the day at which patients are admitted to a hospital (day), a code for the diagnostic activity they received that day (code), a price for that activity (price) and a frequency for that activity (freq).

具有code bc的活动是同时注册的,但是或多或少是同一件事,因此不应重复计算.

Activities with code b and c are registered at the same time but mean more or less the same thing and should not be double counted.

问题

Problem

我想要的是:如果code"b"和"c"在同一天注册了,则应该忽略code"b".

What I want is: if code "b" and "c" are registered for the same day, code "b" should be ignored.

示例data.frame看起来像这样:

x <- data.frame(id = c(rep("a", 4), rep("b", 3)),
            day = c(1, 1, 1, 2, 1, 2, 3),
            price = c(500, 10, 100, rep(10, 3), 100),
            code = c("a", "b", "c", rep("b", 3), "c"),
            freq = c(rep(1, 5), rep(2, 2))))

> x
  id day price code freq
1  a   1   500    a    1
2  a   1    10    b    1
3  a   1   100    c    1
4  a   2    10    b    1
5  b   1    10    b    1
6  b   2    10    b    2
7  b   3   100    c    2

因此,根据我的计算,第1天患者"a"的费用为600,而不是610:

So the costs for patient "a" for day 1 would be 600 and not 610 as I can compute with the following:

x %>% 
  group_by(id, day) %>% 
  summarise(res = sum(price * freq))

# A tibble: 5 x 3
# Groups:   id [?]
  id      day   res
  <fct> <dbl> <dbl>
1 a        1.  610.
2 a        2.   10.
3 b        1.   10.
4 b        2.   20.
5 b        3.  200.


可能的方法


Possible approaches

当同一天出现"c"时,我要么删除观察结果"code""b",要么在code"c"出现的情况下,将code"b"的freq设置为0.同一天.

Either I delete observation code "b" when "c" is present on that same day or I set freq of code "b" to 0 in case code "c" is present on the same day.

到目前为止,我对ifelsemutate的所有尝试都失败了.

All my attempts with ifelse and mutate failed so far.

我们非常感谢您的帮助.提前非常感谢您!

Every help is much appreciated. Thank you very much in advance!

推荐答案

您可以添加filter行以删除此类有问题的b值...

You can add a filter line to remove the offending b values like this...

x %>% 
  group_by(id, day) %>% 
  filter(!(code=="b" & "c" %in% code)) %>% 
  summarise(res = sum(price * freq))

  id      day   res
  <fct> <dbl> <dbl>
1 a        1.  600.
2 a        2.   10.
3 b        1.   10.
4 b        2.   20.
5 b        3.  200.

这篇关于在dplyr中的group_by中有条件地忽略值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆