子集在R中格式错误 [英] Subset dates in the wrong format in R
问题描述
我有问卷数据,参与者以各种格式输入了他们的出生日期:
ID < c(101,102,103,104,105,106,107)
dob< - c(20/04/2001,29/10/2000,2012年9月1日,15/11/00,20.01.1999 1999年4月20日,04/08/01)
df< - data.frame(ID,dob)
在进行任何分析之前,我需要在数据不正确的格式(即dd / mm / yr)时对数据进行子集,然后手动修正每个单元格。 p>
我尝试使用:
df $ dob< - strptime(dob ,%d /%m /%Y)
...以突出显示我的哪个日期是正确的格式,但我只是得到NAs的输入错误的日期,这是没有帮助的,如果我想随后更改它们(使用ID作为参考)。
有人有任何想法可以帮助我吗?
查看 lubridate
包。
库(lubridate)
parse_date_time(dob,c(dmy,Bdy))
#[1]2001-04-20 UTC2000-10-29 UTC2012-09-01 UTC 0000-11-15 UTC1999-01-20 UTC
#[6]1999-04-20 UTC0001-08-04 UTC
I have questionnaire data where participants have inputted their date of birth in a wide variety of formats:
ID <- c(101,102,103,104,105,106,107)
dob <- c("20/04/2001","29/10/2000","September 1 2012","15/11/00","20.01.1999","April 20th 1999", "04/08/01")
df <- data.frame(ID, dob)
Before doing any analysis, I need to be able to subset the data when it is not in the correct format (i.e. dd/mm/yr) and then correct each cell in turn manually.
I tried using:
df$dob <- strptime(dob, "%d/%m/%Y")
... to highlight which of my dates were in the correct format, but I just get NAs for the dates that are inputted incorrectly which is not helpful if I want to subsequently change them (using the ID as a reference).
Does anyone have any ideas which may be able to help me?
Check out the lubridate
package.
library(lubridate)
parse_date_time(dob, c("dmy", "Bdy"))
# [1] "2001-04-20 UTC" "2000-10-29 UTC" "2012-09-01 UTC" "0000-11-15 UTC" "1999-01-20 UTC"
# [6] "1999-04-20 UTC" "0001-08-04 UTC"
这篇关于子集在R中格式错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!