如何按 ID 和大型数据框总结“自第一个日期以来的天数"和“看到的天数" [英] How to summarize `Number of days since first date` and `Number of days seen` by ID and for a large data frame

查看:39
本文介绍了如何按 ID 和大型数据框总结“自第一个日期以来的天数"和“看到的天数"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

数据帧 df1 总结了通过时间 (Date) 对个人 (ID) 的检测.作为一个简短的例子:

The dataframe df1 summarizes detections of individuals (ID) through the time (Date). As a short example:

df1<- data.frame(ID= c(1,2,1,2,1,2,1,2,1,2),
                 Date= ymd(c("2016-08-21","2016-08-24","2016-08-23","2016-08-29","2016-08-27","2016-09-02","2016-09-01","2016-09-09","2016-09-01","2016-09-10")))

df1

   ID       Date
1   1 2016-08-21
2   2 2016-08-24
3   1 2016-08-23
4   2 2016-08-29
5   1 2016-08-27
6   2 2016-09-02
7   1 2016-09-01
8   2 2016-09-09
9   1 2016-09-01
10  2 2016-09-10

我想总结一下自首次检测到个体的天数 (Ndays) 和 自检测到个体以来的天数第一次检测到 (Ndifdays).

I want to summarize either the Number of days since the first detection of the individual (Ndays) and Number of days that the individual has been detected since the first time it was detected (Ndifdays).

另外,我想在这个汇总表中包含一个名为 Prop 的变量,它简单地将 Ndifdays 划分为 Ndays.

Additionally, I would like to include in this summary table a variable called Prop that simply divides Ndifdays between Ndays.

我期望的汇总表是这样的:

The summary table that I would expect would be this:

> Result
  ID Ndays Ndifdays  Prop
1  1    11        4 0.360 # Between 21st Aug and 01st Sept there is 11 days.
2  2    17        5 0.294 # Between 24th Aug and 10st Sept there is 17 days.

有人知道怎么做吗?

推荐答案

使用dplyr

library(dplyr)

df1 %>%
   group_by(ID) %>%
   summarise(Ndays =  as.integer(max(Date) - min(Date)), 
             Ndifdays = n_distinct(Date), 
             Prop = Ndifdays/Ndays)

#     ID Ndays Ndifdays  Prop
#   <dbl> <int>    <int> <dbl>
#1     1    11        4 0.364
#2     2    17        5 0.294

<小时>

data.table 版本是

library(data.table)
df12 <- setDT(df1)[, .(Ndays = as.integer(max(Date) - min(Date)), 
                       Ndifdays = uniqueN(Date)), by = ID]
df12$Prop <- df12$Ndifdays/df12$Ndays

和带有 aggregate

df12 <- aggregate(Date~ID, df1, function(x) c(max(x) - min(x), length(unique(x))))
df12$Prop <- df1$Ndifdays/df1$Ndays

这篇关于如何按 ID 和大型数据框总结“自第一个日期以来的天数"和“看到的天数"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆