使用lubridate在一列中格式化多种日期格式 [英] Format multiple date formats in one columns using lubridate

查看:141
本文介绍了使用lubridate在一列中格式化多种日期格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有时候,我得到的数据集具有两种不同的日期格式,但是必须将公共变量连接到一个数据框中.多年来,我尝试了各种解决方案来解决此工作流程的麻烦.既然我一直在使用lubridate,似乎许多这些问题都可以轻松解决.但是,我遇到了一些对我来说似乎很奇怪的行为,尽管我认为有一个很好的解释超出了我的范围.假设我得到了一个数据集,该数据集具有不同的日期格式,这些数据集可以合并到一个数据框中.该数据框如下所示:

Sometimes I am given data sets that has two different date formats but common variables that have to been joined into one dataframe. Over the years, I've tried various solutions to get around this workflow hassle. Now that I've been using lubridate, it seems like many of these problems are easily solved. However, I am encountering some behaviour that seems weird to me though I imagine there is a good explanation that is beyond me. Say I am given a data set with different date formats that I join into one data frame. This dataframe looks like this:

library(ludridate)
library(dplyr)

df<-data.frame(Lab=c("A","B"),DATE=c("12/15/15","12/15/2013")); df

我想用lubridate将此数据转换为日期格式.但是,以下内容格式不一致:

I want to convert this data to a date format with lubridate. However the following does not format consistently:

df %>% 
  mutate(mdy(DATE))

...而是创建一个0015日期.如果我仅针对实验室"A"进行过滤:

...but rather creates a 0015 date. If I filter just for Lab "A":

df %>% 
  filter(Lab=="A") %>%
  mutate(mdy(DATE))

...甚至是group_by Lab:

... or even group_by Lab:

df %>% 
  group_by(Lab) %>%
  mutate(mdy(DATE))

然后我得到了所需的年份格式.这是lubridate系列日期格式化函数的正确行为吗?有没有更好的方法来完成我的工作?我确信在一列中使用多种日期格式是相对常见的(而且很烦人).

Then I get the desired year format. Is this the correct behaviour of the lubridate family of date formatting functions? Is there a better way to accomplish what I am doing? I am sure that multiple date formats in one column is a relatively common (and annoying) occurence.

谢谢.

推荐答案

从parse_date_time的帮助中:

From the help on parse_date_time:

## ** how to use select_formats **
## By default %Y has precedence:
parse_date_time(c("27-09-13", "27-09-2013"), "dmy")
## [1] "13-09-27 UTC"   "2013-09-27 UTC"

## to give priority to %y format, define your own select_format function:

my_select <-   function(trained){
  n_fmts <- nchar(gsub("[^%]", "", names(trained))) + grepl("%y",     names(trained))*1.5
  names(trained[ which.max(n_fmts) ])
}

parse_date_time(c("27-09-13", "27-09-2013"), "dmy", select_formats = my_select)
## '[1] "2013-09-27 UTC" "2013-09-27 UTC"

这篇关于使用lubridate在一列中格式化多种日期格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆