子集在R中格式错误 [英] Subset dates in the wrong format in R

查看:164
本文介绍了子集在R中格式错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有问卷数据,参与者以各种格式输入了他们的出生日期:

  ID < c(101,102,103,104,105,106,107)
dob< - c(20/04/2001,29/10/2000,2012年9月1日,15/11/00,20.01.1999 1999年4月20日,04/08/01)
df< - data.frame(ID,dob)

在进行任何分析之前,我需要在数据不正确的格式(即dd / mm / yr)时对数据进行子集,然后手动修正每个单元格。 p>

我尝试使用:

  df $ dob<  -  strptime(dob ,%d /%m /%Y)

...以突出显示我的哪个日期是正确的格式,但我只是得到NAs的输入错误的日期,这是没有帮助的,如果我想随后更改它们(使用ID作为参考)。



有人有任何想法可以帮助我吗?

解决方案

查看 lubridate 包。

 库(lubridate)
parse_date_time(dob,c(dmy,Bdy))
#[1]2001-04-20 UTC2000-10-29 UTC2012-09-01 UTC 0000-11-15 UTC1999-01-20 UTC
#[6]1999-04-20 UTC0001-08-04 UTC


I have questionnaire data where participants have inputted their date of birth in a wide variety of formats:

ID <- c(101,102,103,104,105,106,107)
dob <- c("20/04/2001","29/10/2000","September 1 2012","15/11/00","20.01.1999","April 20th 1999", "04/08/01")
df <- data.frame(ID, dob)

Before doing any analysis, I need to be able to subset the data when it is not in the correct format (i.e. dd/mm/yr) and then correct each cell in turn manually.

I tried using:

df$dob <- strptime(dob, "%d/%m/%Y")

... to highlight which of my dates were in the correct format, but I just get NAs for the dates that are inputted incorrectly which is not helpful if I want to subsequently change them (using the ID as a reference).

Does anyone have any ideas which may be able to help me?

解决方案

Check out the lubridate package.

library(lubridate)
parse_date_time(dob, c("dmy", "Bdy"))
# [1] "2001-04-20 UTC" "2000-10-29 UTC" "2012-09-01 UTC" "0000-11-15 UTC" "1999-01-20 UTC"
# [6] "1999-04-20 UTC" "0001-08-04 UTC"

这篇关于子集在R中格式错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆