将字符串转换为R中的日期 [英] Converting a character string into a date in R

查看:196
本文介绍了将字符串转换为R中的日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要转换的数据应该是日期,但其格式为mmddyyyy,不能用破折号或斜杠分隔。为了使用R中的日期,我希望将此格式设置为mm-dd-yyyy或mm / dd / yyyyy。

The data I'm trying to convert is supposed to be a date, however it is formatted as mmddyyyy with no separation by dashes or slashes. In order to work with dates in R, I would like to have this formatted as mm-dd-yyyy or mm/dd/yyyy.

我想我可能需要使用 grep(),但是我不确定如何使用它重新格式化所有mmddyyyy格式的日期。

I think I might need to use grep(), but I'm not sure how to use it to reformat all of the dates that are in the mmddyyyy format.

推荐答案

已更新:通过 @进行了改进理查德·斯克里文(Richard Scriven) colClasses 和更简单的 as.Date()建议

Updated: Improved with @Richard Scriven's colClasses and simpler as.Date() suggestions

这里有两种对我有用的方法,从包含 mmddyyyy 格式日期的csv到被R识别为日期对象的csv 。

Here are two similar methods that worked for me, going from a csv containing mmddyyyy format date, to getting it recognized by R as a date object.

首先从一个简单的文件tv.csv开始:

Starting first with a simple file tv.csv:

Series,FirstAir
Quantico,09272015
Muppets,09222015



方法1:全部作为字符串



在R内一次,

Method 1: All as string

Once within R,

> t = read.csv('tv.csv', colClasses = 'character')




  • 导入 tv.csv 作为名为 t
  • 的数据框
  • colClasses ='character')选项会使所有数据被视为 character 数据类型(而不是 Factor int 类型)

    • imports tv.csv as a data frame named t
    • colClasses = 'character') option causes all the data to be considered the character data type (instead of being Factor, int types)
    • 检查其初始结构:

      > str(t)
      'data.frame':   2 obs. of  2 variables:
       $ Series  : chr  "Quantico" "Muppets"
       $ FirstAir: chr  "09272015" "09222015"
      




      • R已将所有字符导入为字符串,在此表示为 chr

        • R has imported all as strings of characters, indicated here as type chr
        • 然后轻松地将 chr 或字符串转换为日期:

          The chr or string of characters are then easily converted into a date:

          > t$FirstAir = as.Date(t$FirstAir, "%m%d%Y")
          




          • as.Date()执行字符串到日期的转换

          • %m% d%Y 指定如何解释 t $ FirstAir 中的输入。这些格式代码,至少在Linux上,可以通过运行 $ man date 找到,这会在 date 程序,其中包含格式代码列表。例如,它说%m月(01..12)

            • as.Date() performs string to date conversion
            • %m%d%Y specifies how to interpret the input in t$FirstAir. These format codes, at least on Linux, can be found with running $ man date which brings up the manual on the date program, where there is a list of formatting codes. For example it says %m month (01..12)
            • 如果由于某种原因您不希望将所有字符全部转换,例如,一个包含多个变量的文件,并且希望保留R的自动类型识别功能,但仅修复一个日期变量,请遵循此方法。

              If for some reason you don't want a blanket import conversion to all characters, for example a file with many variables and wish to leave R's auto type recognition in use but merely "fix" the one date variable, follow this method.

              一旦在R内,

              > t = read.csv('tv.csv')
              




              • 导入 tv.csv 作为名为 t

                • imports tv.csv as a data frame named t
                • 检查其初始结构:

                  > str(t)
                  'data.frame':   2 obs. of  2 variables:
                   $ Series  : Factor w/ 2 levels "Muppets","Quantico": 2 1
                   $ FirstAir: int  9272015 9222015
                  >
                  




                  • R会尽力猜测每个变量的变量类型

                  • 您可以立即看到一个问题,因为 FirstAir 变量R导入了 09272015 因为 int 表示整数,并且删除了前导零填充,所以09中的0对于稍后的日期转换很重要,而R却没有导入。因此,我们需要解决此问题。

                    • R tries its best to guess the variable type per variable
                    • As you can see an immediate problem is, for FirstAir variable R has imported 09272015 as int meaning integer, and dropped off the leading zero padding , the 0 in 09 is important later for date conversion yet R has imported it without. So we need to fix this.
                    • 这可以在一个命令中完成,但为清楚起见,我将其分为两个步骤。首先,

                      This can be done in a single command but for clarity I have broken this into two steps. First,

                      > t$FirstAir = sprintf("%08d", t$FirstAir)
                      




                      • sprintf 是一种格式化函数

                      • 0 表示填充为零

                      • 8 表示确保8个字符,因为mmddyyyyy总共8个字符

                      • d 在输入为数字(当前为数字)时使用,请回想 str()输出要求 t $ FirstAir int 的意思是整数

                      • t $ FirstAir 是我们正在设置并用作输入的变量

                        • sprintf is a formatting function
                        • 0 means pad with zeroes
                        • 8 means ensure 8 characters, because mmddyyyy is total 8 characters
                        • d is used when the input is a number, which currently it is, recall str() output claimed the t$FirstAir is an int meaning integer
                        • t$FirstAir is the variable we are both setting and using as input
                        • 检查结果:

                          > str(t$FirstAir)
                           chr [1:2] "09272015" "09222015"
                          




                          • 已成功将其从 int 转换为 chr 类型,例如 9272015 变为 09272015

                            • it successfully converted from an int to a chr type, for example 9272015 became "09272015"
                            • 现在它是字符串或 chr 类型,然后我们可以进行转换,与方法1相同。

                              Now it is a string or chr type we can then convert, same as method 1.

                              > t$FirstAir = as.Date(strptime(t$FirstAir, "%m%d%Y"))
                              



                              结果



                              我们做最后检查:

                              Result

                              We do a final check:

                              > str(t$FirstAir)
                               Date[1:2], format: "2015-09-27" "2015-09-22"
                              

                              在这两种情况下,文本文件中的原始值现在都已成功转换为R日期对象。

                              In both cases, what were original values in a text file are have now been successfully converted into R date objects.

                              这篇关于将字符串转换为R中的日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆