R中每个ID的最早日期 [英] Earliest Date for each id in R

查看:176
本文介绍了R中每个ID的最早日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,其中每个个体( id )都有一个 e_date ,并且由于每个个体都可以有多个 e_date ,试图获取每个人的最早约会。因此,基本上,我希望有一个数据集,每个 id 均显示一行,以显示他最早的 e_date 值。
我使用了聚合函数来找到最小值,我创建了一个新的变量,将日期和id结合在一起,最后我基于原始数据集的子集,使用新变量包含最小值创建。我来了:

I have a dataset where each individual (id) has an e_date, and since each individual could have more than one e_date, I'm trying to get the earliest date for each individual. So basically I would like to have a dataset with one row per each id showing his earliest e_date value. I've use the aggregate function to find the minimum values, I've created a new variable combining the date and the id and last I've subset the original dataset based on the one containing the minimums using the new variable created. I've come to this:

new <- aggregate(e_date ~ id, data_full, min)

data_full["comb"] <- NULL
data_full$comb <- paste(data_full$id,data_full$e_date)

new["comb"] <- NULL
new$comb <- paste(new$lopnr,new$EDATUM)

data_fixed <- data_full[which(new$comb %in% data_full$comb),]

第一件事是聚合函数似乎根本不起作用,它减少了行,但查看数据后,我可以清楚地看到,某些ID用不同的 e_date 出现了多次。另外,当我使用as.Date格式而不是日期(整数)的原始格式时,代码为我提供了不同的结果。我认为答案很简单,但我对此很惊讶。

The first thing is that the aggregate function doesn't seems to work at all, it reduces the number of rows but viewing the data I can clearly see that some ids appear more than once with different e_date. Plus, the code gives me different results when I use the as.Date format instead of its original format for the date (integer). I think the answer is simple but I'm struck on this one.

推荐答案

我们可以使用 data .table 。将'data.frame'转换为'data.table'( setDT(data_full)),按'id'分组,我们得到第一行( head(.SD,1L))。

We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(data_full)), grouped by 'id', we get the 1st row (head(.SD, 1L)).

library(data.table)
setDT(data_full)[order(e_date), head(.SD, 1L), by = id]



< hr>

或使用 dplyr ,按'id'分组后,安排 e_date(假设它属于 Date 类),并获得带有 slice 的第一行。


Or using dplyr, after grouping by 'id', arrange the 'e_date' (assuming it is of Date class) and get the first row with slice.

library(dplyr)
data_full %>%
    group_by(id) %>%
    arrange(e_date) %>%
    slice(1L)






如果我们需要 base R 选项,则可以使用 ave

data_full[with(data_full, ave(e_date, id, FUN = function(x) rank(x)==1)),]

这篇关于R中每个ID的最早日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆