如何在R中确定日期间隔是否与数据帧中同一个人的另一个日期间隔重叠? [英] How do I determine in R if a date interval overlaps another date interval for the same individual in a data frame?

查看:46
本文介绍了如何在R中确定日期间隔是否与数据帧中同一个人的另一个日期间隔重叠?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个医院索赔数据集.每行都是一个索赔,我有以下几列:患者ID,开始日期和结束日期.如果患者多次去医院,他们可能有多种索赔要求.我正在尝试根据数据集中的所有索赔计算患者在医院花费的总时间.

I have a dataset of hospital claims. Each row is a claim, and I have the following columns: patient id, start date, and end date. Patients can have multiple claims if they visited the hospital multiple times. I'm trying to calculate the total time the patient spent in the hospital based on all the claims in the dataset.

library(tibble)
df <- tribble(
  ~id, ~start_date, ~end_date,
  "100003186", "2011-06-18", "2011-08-09",
  "100003186", "2011-06-18", "2011-08-23",
  "100003186", "2011-12-14", "2011-12-16",
  "100003186", "2014-09-14", "2014-09-17",
  "100003186", "2014-09-10", "2014-09-18",
  "100003187", "2011-11-18", "2011-11-30",
  "100003187", "2011-11-18", "2011-11-23",
)

问题在于某些声明的日期重叠.例如,对于id =="100003186",第一项索赔是从日期2011-06-18到2011-08-09,但是此时间段已包含在第二项索赔中,从日期2011-06-18到2011-08-09.2011-08-23.

The problem is that some claims overlap in their dates. For example, for id=="100003186", the first claim is from date 2011-06-18 to 2011-08-09, but this time period is already contained in the second claim, from date 2011-06-18 to 2011-08-23.

如何删除同一个人(id)的另一个声明间隔内包含时间间隔的行?

How can I delete the rows where the time interval is contained within the interval of another claim for the same individual (id)?

这个问题提供了一种可能的解决方案,但是我想通过id来实现:

This question offers a possible solution, but I'd like to implement it by id: R: Determine if each date interval overlaps with all other date intervals in a dataframe

推荐答案

按开始日期排序,然后查找结束日期小于上一个日期的任何内容.

Sort by start date, then look for any that have an end date less than the previous one.

library(dplyr)
df %>% 
      arrange(id, start_date) %>% 
      group_by(id) %>% 
      mutate(contained = end_date <= lag(end_date)) %>%
      filter(!contained | is.na(contained))

这是弱收容";也就是说,它可能会删除一些具有相同开始日期和/或结束日期的内容.如果您不希望这样做,请根据需要调整 in 之内的计算.最后一行中的 is.na 调用确保我们不会删除每个ID的第一行.

This is "weak containment" i.e. it may delete some that have the same start date and/or end date. If you don't want that, adjust the within calculation as appropriate. The is.na call in the last line ensures we don't delete first rows per ID.

这篇关于如何在R中确定日期间隔是否与数据帧中同一个人的另一个日期间隔重叠?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆