如何按 ID 和大型数据框总结“自第一个日期以来的天数"和“看到的天数" [英] How to summarize `Number of days since first date` and `Number of days seen` by ID and for a large data frame
问题描述
数据帧 df1
总结了通过时间 (Date
) 对个人 (ID
) 的检测.作为一个简短的例子:
The dataframe df1
summarizes detections of individuals (ID
) through the time (Date
). As a short example:
df1<- data.frame(ID= c(1,2,1,2,1,2,1,2,1,2),
Date= ymd(c("2016-08-21","2016-08-24","2016-08-23","2016-08-29","2016-08-27","2016-09-02","2016-09-01","2016-09-09","2016-09-01","2016-09-10")))
df1
ID Date
1 1 2016-08-21
2 2 2016-08-24
3 1 2016-08-23
4 2 2016-08-29
5 1 2016-08-27
6 2 2016-09-02
7 1 2016-09-01
8 2 2016-09-09
9 1 2016-09-01
10 2 2016-09-10
我想总结一下自首次检测到个体的天数
(Ndays
) 和 自检测到个体以来的天数第一次检测到
(Ndifdays
).
I want to summarize either the Number of days since the first detection of the individual
(Ndays
) and Number of days that the individual has been detected since the first time it was detected
(Ndifdays
).
另外,我想在这个汇总表中包含一个名为 Prop
的变量,它简单地将 Ndifdays
划分为 Ndays
.
Additionally, I would like to include in this summary table a variable called Prop
that simply divides Ndifdays
between Ndays
.
我期望的汇总表是这样的:
The summary table that I would expect would be this:
> Result
ID Ndays Ndifdays Prop
1 1 11 4 0.360 # Between 21st Aug and 01st Sept there is 11 days.
2 2 17 5 0.294 # Between 24th Aug and 10st Sept there is 17 days.
有人知道怎么做吗?
推荐答案
使用dplyr
library(dplyr)
df1 %>%
group_by(ID) %>%
summarise(Ndays = as.integer(max(Date) - min(Date)),
Ndifdays = n_distinct(Date),
Prop = Ndifdays/Ndays)
# ID Ndays Ndifdays Prop
# <dbl> <int> <int> <dbl>
#1 1 11 4 0.364
#2 2 17 5 0.294
<小时>
data.table
版本是
library(data.table)
df12 <- setDT(df1)[, .(Ndays = as.integer(max(Date) - min(Date)),
Ndifdays = uniqueN(Date)), by = ID]
df12$Prop <- df12$Ndifdays/df12$Ndays
和带有 aggregate
df12 <- aggregate(Date~ID, df1, function(x) c(max(x) - min(x), length(unique(x))))
df12$Prop <- df1$Ndifdays/df1$Ndays
这篇关于如何按 ID 和大型数据框总结“自第一个日期以来的天数"和“看到的天数"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!