将字符串转换为R中的日期 [英] Converting a character string into a date in R
问题描述
我要转换的数据应该是日期,但其格式为mmddyyyy,不能用破折号或斜杠分隔。为了使用R中的日期,我希望将此格式设置为mm-dd-yyyy或mm / dd / yyyyy。
The data I'm trying to convert is supposed to be a date, however it is formatted as mmddyyyy with no separation by dashes or slashes. In order to work with dates in R, I would like to have this formatted as mm-dd-yyyy or mm/dd/yyyy.
我想我可能需要使用 grep()
,但是我不确定如何使用它重新格式化所有mmddyyyy格式的日期。
I think I might need to use grep()
, but I'm not sure how to use it to reformat all of the dates that are in the mmddyyyy format.
推荐答案
已更新:通过 @进行了改进理查德·斯克里文(Richard Scriven)的 colClasses
和更简单的 as.Date()
建议
Updated: Improved with @Richard Scriven's colClasses
and simpler as.Date()
suggestions
这里有两种对我有用的方法,从包含 mmddyyyy
格式日期的csv到被R识别为日期对象的csv 。
Here are two similar methods that worked for me, going from a csv containing mmddyyyy
format date, to getting it recognized by R as a date object.
首先从一个简单的文件tv.csv开始:
Starting first with a simple file tv.csv:
Series,FirstAir
Quantico,09272015
Muppets,09222015
方法1:全部作为字符串
在R内一次,
Method 1: All as string
Once within R,
> t = read.csv('tv.csv', colClasses = 'character')
- 导入
tv.csv
作为名为t
的数据框 -
colClasses ='character')
选项会使所有数据被视为character
数据类型(而不是Factor
,int
类型) - imports
tv.csv
as a data frame namedt
colClasses = 'character')
option causes all the data to be considered thecharacter
data type (instead of beingFactor
,int
types)
检查其初始结构:
> str(t)
'data.frame': 2 obs. of 2 variables:
$ Series : chr "Quantico" "Muppets"
$ FirstAir: chr "09272015" "09222015"
- R已将所有字符导入为字符串,在此表示为
chr
- R has imported all as strings of characters, indicated here as type
chr
然后轻松地将 chr
或字符串转换为日期:
The chr
or string of characters are then easily converted into a date:
> t$FirstAir = as.Date(t$FirstAir, "%m%d%Y")
-
as.Date()
执行字符串到日期的转换 -
%m% d%Y
指定如何解释t $ FirstAir
中的输入。这些格式代码,至少在Linux上,可以通过运行$ man date
找到,这会在date
程序,其中包含格式代码列表。例如,它说%m月(01..12)
as.Date()
performs string to date conversion%m%d%Y
specifies how to interpret the input int$FirstAir
. These format codes, at least on Linux, can be found with running$ man date
which brings up the manual on thedate
program, where there is a list of formatting codes. For example it says%m month (01..12)
如果由于某种原因您不希望将所有字符全部转换,例如,一个包含多个变量的文件,并且希望保留R的自动类型识别功能,但仅修复一个日期变量,请遵循此方法。
If for some reason you don't want a blanket import conversion to all characters, for example a file with many variables and wish to leave R's auto type recognition in use but merely "fix" the one date variable, follow this method.
一旦在R内,
> t = read.csv('tv.csv')
- 导入
tv.csv
作为名为t
- imports
tv.csv
as a data frame namedt
检查其初始结构:
> str(t)
'data.frame': 2 obs. of 2 variables:
$ Series : Factor w/ 2 levels "Muppets","Quantico": 2 1
$ FirstAir: int 9272015 9222015
>
- R会尽力猜测每个变量的变量类型
- 您可以立即看到一个问题,因为
FirstAir
变量R导入了09272015
因为int
表示整数,并且删除了前导零填充,所以09中的0对于稍后的日期转换很重要,而R却没有导入。因此,我们需要解决此问题。 - R tries its best to guess the variable type per variable
- As you can see an immediate problem is, for
FirstAir
variable R has imported09272015
asint
meaning integer, and dropped off the leading zero padding , the 0 in 09 is important later for date conversion yet R has imported it without. So we need to fix this.
这可以在一个命令中完成,但为清楚起见,我将其分为两个步骤。首先,
This can be done in a single command but for clarity I have broken this into two steps. First,
> t$FirstAir = sprintf("%08d", t$FirstAir)
-
sprintf
是一种格式化函数 -
0
表示填充为零 -
8
表示确保8个字符,因为mmddyyyyy总共8个字符 -
d
在输入为数字(当前为数字)时使用,请回想str()
输出要求t $ FirstAir
是int
的意思是整数 -
t $ FirstAir
是我们正在设置并用作输入的变量 sprintf
is a formatting function0
means pad with zeroes8
means ensure 8 characters, because mmddyyyy is total 8 charactersd
is used when the input is a number, which currently it is, recallstr()
output claimed thet$FirstAir
is anint
meaning integert$FirstAir
is the variable we are both setting and using as input
检查结果:
> str(t$FirstAir)
chr [1:2] "09272015" "09222015"
- 已成功将其从
int
转换为chr
类型,例如9272015
变为09272015
- it successfully converted from an
int
to achr
type, for example9272015
became"09272015"
现在它是字符串或 chr
类型,然后我们可以进行转换,与方法1相同。
Now it is a string or chr
type we can then convert, same as method 1.
> t$FirstAir = as.Date(strptime(t$FirstAir, "%m%d%Y"))
结果
我们做最后检查:
Result
We do a final check:
> str(t$FirstAir)
Date[1:2], format: "2015-09-27" "2015-09-22"
在这两种情况下,文本文件中的原始值现在都已成功转换为R日期对象。
In both cases, what were original values in a text file are have now been successfully converted into R date objects.
这篇关于将字符串转换为R中的日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!