R从半标准字符串中提取时间分量 [英] R extract time components from semi-standard strings

查看：24 发布时间：2021/9/7 19:46:31 string r time posixct

本文介绍了R从半标准字符串中提取时间分量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一列持续时间以字符串形式存储在数据框中.我想将它们转换为适当的时间对象，可能是 POSIXlt.大多数字符串很容易解析使用这个方法:

I have a column of durations stored as a strings in a dataframe. I want to convert them to an appropriate time object, probably POSIXlt. Most of the strings are easy to parse using this method:

> data <- data.frame(time.string = c(
+   "1 d 2 h 3 m 4 s",
+   "10 d 20 h 30 m 40 s",
+   "--"))
> data$time.span <- strptime(data$time.string, "%j d %H h %M m %S s")
> data$time.span
[1] "2012-01-01 02:03:04" "2012-01-10 20:30:40" NA

缺失的持续时间被编码为 "--" 并且需要转换为 NA - 这已经发生了，但应该保留.

Missing durations are coded "--" and need to be converted to NA - this already happens but should be preserved.

挑战在于字符串丢弃零值元素.因此，所需的值 2012-01-01 02:00:14 将是字符串 "1 d 2 h 14 s".然而，这个字符串使用简单的解析器解析为 NA:

The challenge is that the string drops zero-valued elements. Thus the desired value 2012-01-01 02:00:14 would be the string "1 d 2 h 14 s". However this string parses to NA with the simple parser:

> data2 <- data.frame(time.string = c(
+  "1 d 2 h 14 s",
+  "10 d 20 h 30 m 40 s",
+  "--"))
> data2$time.span <- strptime(data2$time.string, "%j d %H h %M m %S s")
> data2$time.span
[1] NA "2012-01-10 20:30:40" NA

问题

处理所有可能的字符串格式的R 方式"是什么?也许单独测试并提取每个元素，然后重新组合?
POSIXlt 是正确的目标类吗?我需要不受任何特定开始时间影响的持续时间，因此添加虚假的年月数据 (2012-01-) 令人不安.

What is the "R Way" to handle all the possible string formats? Perhaps test for and extract each element individually, then recombine?
Is POSIXlt the right target class? I need duration free from any specific start time, so the addition of false year and month data (2012-01-) is troubling.

解决方案

@mplourde 绝对有正确的想法，基于测试日期格式中的各种条件动态创建格式化字符串.添加 cut(Sys.Date(),break='years') 作为 datediff 的基线也很好，但未能解决关键的怪癖as.POSIXct() 注意:我使用的是 R2.11 基础，这可能已在以后的版本中修复.

Solution

@mplourde definitely had the right idea w/ dynamic creation of a formatting string based on testing various conditions in the date format. The addition of cut(Sys.Date(), breaks='years') as the baseline for the datediff was also good, but failed to account for a critical quirk in as.POSIXct() Note: I'm using R2.11 base, this may have been fixed in later versions.

as.POSIXct() 的输出根据是否包含日期组件而显着变化:

The output of as.POSIXct() changes dramatically depending on whether or not a date component is included:

> x <- "1 d 1 h 14 m 1 s"
> y <-     "1 h 14 m 1 s"  # Same string, no date component
> format (x)  # as specified below
[1] "%j d %H h %M m %S s"
> format (y)
[1] "% H h % M %S s"    
> as.POSIXct(x,format=format)  # Including the date baselines at year start
[1] "2012-01-01 01:14:01 EST"
> as.POSIXct(y,format=format)  # Excluding the date baselines at today start
[1] "2012-06-26 01:14:01 EDT"

因此 difftime 函数的第二个参数应该是:

Thus the second argument for the difftime function should be:

如果输入字符串有天组件
当前天的开始，如果输入字符串没有有天组件

The start of the first day of the current year if the input string has a day component
The start of the current day if the input string does not have a day component

这可以通过改变cut函数上的单位参数来实现:

This can be accomplished by changing the unit parameter on the cut function:

parse.time <- function (x) {
  x <- as.character (x)
  break.unit <- ifelse(grepl("d",x),"years","days")  # chooses cut() unit
  format <- paste(c(if (grepl("d", x)) "%j d",
                    if (grepl("h", x)) "%H h",
                    if (grepl("m", x)) "%M m",
                    if (grepl("s", x)) "%S s"), collapse=" ")

  if (nchar(format) > 0) {
    difftime(as.POSIXct(x, format=format), 
             cut(Sys.Date(), breaks=break.unit),
             units="hours")
  } else {NA}

}

推荐答案

difftime 对象是持续时间对象，可以添加到 POSIXct 或 POSIXlt 对象.也许你想用它来代替 POSIXlt?


difftime objects are time duration objects that can be added to either POSIXct or POSIXlt objects. Maybe you want to use this instead of POSIXlt? 
关于从字符串到时间对象的转换，你可以这样做:
Regarding the conversion from strings to time objects, you could do something like this:
data <- data.frame(time.string = c(
    "1 d 1 h",
    "30 m 10 s",
    "1 d 2 h 3 m 4 s",
    "2 h 3 m 4 s",
    "10 d 20 h 30 m 40 s",
    "--"))

f <- function(x) {
    x <- as.character(x)
    format <- paste(c(if (grepl('d', x)) '%j d',
                      if (grepl('h', x)) '%H h',
                      if (grepl('m', x)) '%M m',
                      if (grepl('s', x)) '%S s'), collapse=' ')

    if (nchar(format) > 0) {
        if (grepl('%j d', format)) {
            # '%j 1' is day 0. We add a day so that x = '1 d' means 24hrs.
            difftime(as.POSIXct(x, format=format) + as.difftime(1, units='days'), 
                    cut(Sys.Date(), breaks='years'),
                    units='hours')
        } else {
            as.difftime(x, format, units='hours')
        }
    } else { NA }
}

data$time.span <- sapply(data$time.string, FUN=f)


                        这篇关于R从半标准字符串中提取时间分量的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

R从半标准字符串中提取时间分量 [英] R extract time components from semi-standard strings

问题描述

问题

解决方案

Solution

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R从半标准字符串中提取时间分量 [英] R extract time components from semi-standard strings

问题描述

问题

解决方案

Solution

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭