R中的非标准日期因子的新列 [英] New column from non-standard date factor in R

查看:115
本文介绍了R中的非标准日期因子的新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个奇怪格式化日期列的数据框。我想创建一个刚刚从原始日期列显示年份的列,我无法想到这样做,因为当前日期列被视为一个因素。任何关于如何有效地建议的建议将不胜感激。



示例

开头为:

  org<  -  c(a,b,c,d)
国家< - c(1,2 ,3,4)
日期< - c(01-09-14,01-10-07,11-31-99,10-31-12 )
玩具< - data.frame(cbind(org,country,date))
玩具
国家日期
1 a 1 01-09-14
2 b 2 01-10-07
3 c 3 11-31-99
4 d 4 10-31-12

str(toy $ date)
因素w / 4级别01-09-14,01-10-07,..:1 2 4 3

所需结果:

 组织国家年份
1 a 1 2014
2 b 2 2007
3 c 3 1999
4 d 4 2012


解决方案

这应该可以工作:

  transform(toy,Year = format(strptime(date, %m-%d-%y),%Y))

p>

  ## org国家日期年
## 1 a 1 01-09-14 2014
## 2 b 2 01-10-07 2007
## 3 c 3 11-31-99< NA>
## 4 d 4 10-31-12 2012

我最初以为, code> NA 值是因为%y 格式指标不足以处理上个世纪的日期,但?strptime 说:


'%y'年没有世纪(00-99)。在输入上,值00到68的值为
,前缀为20和69到99之间的19 - 即2004和2008年POSIX标准规定的行为
,但是它们的
也表示为预计在未来版本中,从两位数年份推断的默认
世纪将会改变'。


意味着它应该能够处理它。



问题实际上是11月31日不存在...



(您可以随意删除日期列)...


I have a dataframe with an oddly formatted dates column. I'd like to create a column just showing the year from the original date column and I am having trouble coming up with a way to do this because the current date column is being treated as a factor. Any advice on how to do this efficiently would be appreciated.

Example
starting with:

org <- c("a","b","c","d")
country <- c("1","2","3","4")
date <- c("01-09-14","01-10-07","11-31-99","10-31-12")
toy <- data.frame(cbind(org,country,date))
toy
  org country     date
1   a       1 01-09-14
2   b       2 01-10-07
3   c       3 11-31-99
4   d       4 10-31-12

str(toy$date)
Factor w/ 4 levels "01-09-14","01-10-07",..: 1 2 4 3

Desired result:

  org country     Year
1   a       1     2014
2   b       2     2007
3   c       3     1999
4   d       4     2012

解决方案

This should work:

transform(toy,Year=format(strptime(date,"%m-%d-%y"),"%Y"))

This produces

##   org country     date Year
## 1   a       1 01-09-14 2014
## 2   b       2 01-10-07 2007
## 3   c       3 11-31-99 <NA>
## 4   d       4 10-31-12 2012

I initially thought that the NA value was because the %y format indicator wasn't smart enough to handle previous-century dates, but ?strptime says:

‘%y’ Year without century (00-99). On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19 - that is the behaviour specified by the 2004 and 2008 POSIX standards, but they do also say ‘it is expected that in a future version the default century inferred from a 2-digit year will change’.

implying that it should be able to handle it.

The problem is actually that 31 November doesn't exist ...

(You can drop the date column at your leisure ...)

这篇关于R中的非标准日期因子的新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆