标题名称作为 r 中的日期 [英] Header names as dates in r

查看:45
本文介绍了标题名称作为 r 中的日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试计算用户的死亡",这意味着我想确定用户注册程序和他们不再参与程序之间的持续时间.我有两个文件,我使用 read.csv("filename",header=TRUE) 读入:

I'm trying to calculate the "death" of users, meaning I want to determine the time duration between when a user signs up for a program and when they are no longer active in the program. I have two files which I read in using read.csv("filename",header=TRUE):

 >   df
      name   start.date
1  Allison   2013-03-16
2   Andrew   2013-03-16
3     Carl   2013-03-16
4     Dora   2013-03-17
5   Hilary   2013-03-17
6    Louis   2013-03-19
7     Mary   2013-03-20
8   Mickey   2013-03-20

和文件 2:

> df2
       names X04.16.2013 X04.17.2013 X04.18.2014  X04.19.2013
2001 Allison           5           5           0           0
2002  Andrew           0           0           0           0
2003    Carl           8           8           11          10
2004    Dora           6           4           9           3
2005  Hilary           2           0           0           0
2006   Louis           18         10           8           3
2007    Mary           4           7           7           0
2008  Mickey           9           5           0           0

我想做的是将 df2 的标题名称转换为日期,这样我就可以创建一个新的数据框,其中包含用户名、开始日期和死亡天数",这将是当用户在 df2 中的条目为 0:

What I would like to do is convert the header names of df2 to dates, so I can then create a new data frame that has the user names, start date, and "days to death", which would be when a user has an entry of 0 in df2:

      name   start.date   days.to.death
1  Allison   2013-03-16   33
2   Andrew   2013-03-16   0
3     Carl   2013-03-16   NA
4     Dora   2013-03-17   NA
5   Hilary   2013-03-17   31
6    Louis   2013-03-19   NA
7     Mary   2013-03-20   30
8   Mickey   2013-03-20   28

请注意,安德鲁从未活着",而卡尔、朵拉和路易斯还没有死"过.我对 R 还是比较陌生,所以非常感谢任何输入!

Note that Andrew was never "alive" and Carl, Dora, and Louis haven't "died" yet. I'm still rather new to R so any input is much appreciated!

推荐答案

假设 df2 的列标题中存在拼写错误,以下使用 dplyr 和 tidyr 的解决方案可以帮助您完成大部分工作...

Assuming a typo in your column headers for df2, the following solution using dplyr and tidyr gets you most of the way there...

  library(tidyr)
  library(dplyr)

  colnames(df)<-c("names", "start") #  To join dfs, the first column header needs to be identical to df2
  df$start<-as.Date(df$start, format="%m/%d/%Y") #formatting date

以下在 df2 上工作,通过对数据进行长格式、格式化日期(类似于 MrFlick 的建议)然后只保留其中包含 0 的日期.然后它采用第一个实例(即假设您的日期从左到右按时间顺序排列的最早日期).然后它计算从该日期(结束日期)到 df 开始日期的日期差异.我使用了与 MrFlick 相同的格式 - 但您可以简单地将差异计算为整数.

The following works on df2 by long-forming the data, formatting the dates (similar to MrFlick's suggestion) and then only keeping the dates that have a 0 in them. It then takes the first instance of this (i.e. the earliest date assuming your dates are in chronological order along the cols from left to right). It then calculates the difference in date from that date (the enddate) to the start date from df. I've used the same format as MrFlick - but you could simply calculate the difference as an integer.

  df2 %>% 
  filter(X04.16.2013!=0) %>%   #removes Andrew who has 0 in first date col
  gather(key,value,2:5) %>%     
  mutate(date=as.Date(key, format="X%m.%d.%Y")) %>%
  left_join(df) %>%
  filter(value==0) %>%
  group_by(names) %>%
  filter(date == nth(date, 1)) %>% 
  select(names, start, date) %>%
  mutate (daydiff=difftime(date,start, unit="days"))

给这个...

    names      start       date daydiff
1  Hilary 2013-03-17 2013-04-17 31 days
2 Allison 2013-03-16 2013-04-18 33 days
3  Mickey 2013-03-20 2013-04-18 29 days
4    Mary 2013-03-20 2013-04-19 30 days

放入 NA 和那些从未住过的人应该很容易.也许这有点帮助?​​

it should be pretty easy to put in the NAs and those who never lived. Perhaps this helps a little?

这篇关于标题名称作为 r 中的日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆