处理由天、小时、分钟和秒定义的持续时间,例如“1d 3h 2m 28s"在 R [英] Dealing with durations defined by days, hours, minutes and seconds such as "1d 3h 2m 28s" in R

查看:64
本文介绍了处理由天、小时、分钟和秒定义的持续时间,例如“1d 3h 2m 28s"在 R的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含字符向量的数据框,格式为1d 3h 2m 28s":

I have a data frame with character vectors in the format with days, hours, minutes and seconds represented like "1d 3h 2m 28s":

> head(status[5])
    Duration 
1 0d 20h 46m 31s 
2  2d  0h 13m 54s
3  2d  0h 13m 53s
4  0d  9h 53m 38s
5  5d 12h 17m 37s
6  0d 10h 21m 19s

我可以用组件的正则表达式解析它,但无法想出一种将持续时间转换为秒的好方法.我可以将向量 gsub 转化为一个表达式,该表达式会得到秒数,但在结果上使用 eval 会遇到障碍.

I can parse it with regex for the components but cannot come up with a good way to convert the duration into seconds. I can gsub the vectors into an expression that would result in the number of seconds but hit a road block with using eval on the results.

我可以做一些类似于推荐的事情 here 但希望遵循正则表达式路线 - 即使它不是最有效的.我只处理解析各种小的 HTML 表格.

I could do something similar to what was recommended here but hoped to follow the regex route - even if it isn't the most efficient. I'm only dealing with parsing a variety of small HTML tables.

status$duration <- gsub("(\\d+)d\\s+(\\d+)h\\s+(\\d+)m\\s+(\\d+)s.*","\\1*86400+\\2*3600+\\3*60+\\4",as.character(status[,5]),perl=TRUE)

上面创建了一个可以计算的表达式,但是当涉及到 parse(text=status$duration) 和随后的 eval 时,我遗漏了一些东西.

The above creates an expression that can be evaluated but I'm missing something when it comes to parse(text=status$duration) and a subsequent eval.

在 perl 中,我习惯于在正则表达式中获取捕获的变量"并立即使用它们,而不仅仅是在替换字符串中.R 中有类似的可能性吗?

In perl, I'm accustomed to taking the "captured variables" in the regex expression and immediately using them rather than only within a replacement string. Are there similar possibilities in R?

谢谢,我可能因为头脑模糊而遗漏了一些非常简单的东西.

Thank you, I'm probably missing something very simple due to fogginess of mind.

推荐答案

下面的第一个和最后一个解决方案看起来最简单,但具有复杂正则表达式的解决方案更接近于可能在 perl 中所做的.

The first and last solutions below seem the simplest but the ones with complex regexps correspond more closely to what might have been done in perl.

在列出解决方案本身之前,请注意在其中我们假设输入是 tt 并且转换向量 mult 是一个 4 向量,其分量是秒数在一天,小时,分钟和秒.我们可以在注释中设置 mult 或按如下所示计算:

Before listing the solutions themselves, note that in them we assume the input is tt and the conversion vector mult is a 4-vector whose components are the number of seconds in a day, hour, minute and second. We can set mult as in the comment or calculate it as shown:

tt <- c("0d 20h 46m 31s", "2d 0h 13m 54s", "2d 0h 13m 53s", 
   "0d 9h 53m 38s", "5d 12h 17m 37s", "0d 10h 21m 19s")
# mult <- c(86400, 3600, 60, 1)
mult <- rev(cumprod(rev(c(24, 60, 60, 1))))

这里有 4 种方法:

1)strapply 提取数字 我们可以在 gsubfn 包中使用 strapply 来避免复杂的正则表达式.strapply 用于提取将它们排列在矩阵中的所有数字,然后乘以 mult 将结果串成一个普通的数字向量:

1) strapply extracting numerics We can use strapply in the gsubfn package to avoid complex regular expressions. strapply is used to extract all the numbers arranging them in a matrix and multiply by mult stringing out the result in a plain numeric vector:

library(gsubfn)
mat <- strapply(tt, "\\d+", as.numeric, simplify = TRUE)
secs <- c(mult %*% mat)

这两行可以合并为一个语句,但如果您希望单独检查 mat,我们将保留如上.

The two lines could be combined into a single statement but we will leave it as above in case you wish to examine mat separately.

2) strapply with complex regexp 另一种可能性,同样使用 strapply 是以下单个语句.捕获的字符串在遇到时被放入自由变量中,因此第一个捕获进入 day,第二个进入 hour,等等.这个可能更接近你本来可以在 perl 中完成的:

2) strapply with complex regexp Another possibility, also using strapply is the following single statement. The captured strings are placed into the free variables as they are encountered so the first capture goes into day, the second into hour, etc. This one may be closer to what you would have done in perl:

secs <- strapply(tt, "(\\d+)d (\\d+)h (\\d+)m (\\d+)s", 
 ~ 86400 * as.numeric(day) + 3600 * as.numeric(hour) + 
    60 * as.numeric(minute) + as.numeric(second), simplify = TRUE)

3) 使用复杂的正则表达式进行捆绑,但矢量化 甚至更短:

secs <- strapply(tt, "(\\d+)d (\\d+)h (\\d+)m (\\d+)s", 
  ~ as.numeric(list(...)) %*% mult, simplify = TRUE)

4) strsplit,这是另一个单一陈述的答案.这个不使用 strapply 而是利用了这样一个事实,即在没有以下空字符串输出的情况下删除了字符串末尾的匹配分隔符.详情请参阅 ?strsplit.

4) strsplit and here is another single statement answer. This one does not use strapply but makes use of the fact that a matching separator at the end of the string is just removed without the following empty string output. See ?strsplit for details.

secs <- sapply(strsplit(tt, "[dhms]"), function(x) as.numeric(x) %*% mult)

以上任何一项的结果是:

The result from any of the above is:

> secs
[1]  74791 173634 173633  35618 476257  37279

这篇关于处理由天、小时、分钟和秒定义的持续时间,例如“1d 3h 2m 28s"在 R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆