匹配R中的多个日期值 [英] Matching multiple date values in R

查看:85
本文介绍了匹配R中的多个日期值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框DF,用于描述在某些日期从事过某个项目的人员:

I have the following dataframe DF describing people that have worked on a project on certain dates:

ID    ProjectName    StartDate 
1       Health        3/1/06 18:20
2       Education     2/1/07 15:30
1       Education     5/3/09 9:00
3       Wellness      4/1/10 12:00
2       Health        6/1/11 14:20

目标是找到与每个ID对应的第一个项目.例如,预期输出如下:

The goal is to find the first project corresponding to each ID. For example the expected output would be as follows:

ID    ProjectName    StartDate 
1       Health        3/1/06 18:20
2       Education     2/1/07 15:30
3       Wellness      4/1/10 12:00

到目前为止,我已经完成以下操作以获取每个ID的第一个StartDate:

So far I have done the following to get the first StartDate for each ID:

sub <- ddply(DF, .(ID), summarise, st = min(as.POSIXct(StartDate)));

此后,我需要将sub中的每一行与原始DF匹配,并提取与该ID和StartDate对应的项目.可以为sub中的每一行循环执行此操作.但是,我的数据集非常大,我想知道是否存在一种有效的方法来执行此匹配并从DF中提取此子集.

After this, I need to match each row in sub with the original DF and extract the projects corresponding to that ID and StartDate. This can be done in a loop for each row in sub. However, my dataset is very large and I would like to know if there is an efficient way to do this matching and extract this subset from DF.

推荐答案

使用match非常简单,因为match返回:

This is fairly straightforward using match because match returns:

first 与其中第一个参数匹配的位置的向量 第二个

a vector of the positions of first matches of its first argument in its second

因此,您要做的只是按日期排序,然后使用unique获取每个ID的一个实例,并使用match查找第一个位置.感谢@MatthewLunberg提供了可重复的数据示例:

So all you need to do is sort by date, then use unique to get one instance of each ID and match to find the first position. Thanks to @MatthewLunberg for providing a reproducible example of your data:

DF <- DF[ order(as.POSIXct(DF$StartDate, format="%m/%d/%y %H:%M")) , ]
DF[ match( unique( DF$ID ) , DF$ID ) , ]
#  ID ProjectName    StartDate
#6  1      Health 1/1/06 11:10
#2  2   Education 2/1/07 15:30
#4  3    Wellness 4/1/10 12:00

优点之一是,它可以在重新使用之前保留原始数据帧的行号.我不知道这是否对您有用.

One advantage is that it retains the rownumbers of the original dataframe before resorting. I do not know if this could be useful to you.

这篇关于匹配R中的多个日期值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆