如何检查ID是否在直到退出日期的特定日期进入数据 [英] How to check if an id comes into data on a particular date that it stays until an exit date

查看:76
本文介绍了如何检查ID是否在直到退出日期的特定日期进入数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,如下所示。基本上,我有兴趣检查特定的ID是否在年初(本例中为2003年1月1日)存在,而该ID每天都存在到年底(2003年12月31日),然后开始检查过程从明年年初开始,情况可能会再次发生变化,因为人们可能每年都在变化,但一年之内应该不会变化。如果某天不存在ID,我想知道哪一天和哪个ID。

I have a data set that looks something like below. Basically, I am interested in checking if a particular id is present at the beginning of the year(in this case jan,1,2003) that it is present everyday until the end of the year( dec 31 2003) then starting the checking process over again with the start of next year as people might change from year to year but should not change within a year. If on certain day, an id is not present I would like to know which day and which id.

我首先以for循环开始,每两天检查一次,但这由于我的数据集跨越大约50年,因此效率极低。

I first started with a for loop and checked every two days but this is super inefficient since my data set spans roughly 50 years and will grow later on with new data.

dates <- rep(seq(as.Date("2003/01/01"), as.Date("2004/12/31"), "days"),each = 3)
id <- rep(1:3,times = length(unique(dates))) 
df <- data.frame( dates = dates,id = id)

编辑:上面的块中有所有日期,但是如果我在第二天删除例如id = 1,则代码应告诉我它丢失了,因此不应计数相同。我在下面的第二天添加了该片段以删除id = 1。

The above chunk has all the dates in it but if I delete for example id = 1 on the second day, the code should tell me it is missing so the count shouldn't be the same. I added the piece to delete the id = 1 on the second day below.

df <- df[-4,]

下面的代码将创建相同的数据集,但对于2003年1月2日和1月,删除id = 1 2003年3月3日。我试图获取返回缺少的ID和日期的东西。

The code below will make the same data set but delete id = 1 for jan 2, 2003 and jan 3, 2003. I am trying to get something that returns the id that is missing and the date.

dates <- rep(seq(as.Date("2003/01/01"), as.Date("2004/12/31"), "days"),each = 3)
id <- rep(1:3,times = length(unique(dates))) 
df <- data.frame( dates = dates,id = id)
df <- df[-4,]
df <- df[-6,]


推荐答案

此代码块将计算一个人每年出现的次数。如果答案是in年中的365或366,那么一个人一年中每天都在那儿。

This code chunk will count number of times a person appears in each year. if the answer is 365 or 366 in leap years a person was there everyday of the year.

library(dplyr)
library(tidyr)

dates <- rep(seq(as.Date("2003/01/01"), as.Date("2004/12/31"), "days"),each = 3)
id <- rep(1:3,times = length(unique(dates))) 
df <- data.frame( dates = dates,id = id)

    dfx <- df %>% 
          mutate(yrs = lubridate::year(dates)) %>% 
          group_by(id, dates) %>% 
          filter(row_number()==1) %>% 
          group_by(id, yrs) %>% 
          tally



#remove values
dfa <- df[c(-4,-6),]

查找缺失值日期的设备将指示符列添加到数据集。然后按ID填写缺少的日期。此后,val列将缺少值。过滤数据以获取丢失的日期。

The in oder to find the date of missing value add an indicator column to the data set. then fill in the missing dates by id. After this the val column will have missing values. Filter the data to get the dates where it went missing.

dfx <- dfa %>% 
        mutate(val = 1) %>% 
       complete(nesting(id),
                dates = seq(min(dates),max(dates),by = "day")) %>% 
        filter(is.na(val))

这篇关于如何检查ID是否在直到退出日期的特定日期进入数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆