R:变换不规则时间串 [英] R: transform irregular time strings
问题描述
我有来自不同数据帧的两个不同时间序列,具有不同的不规则格式,但问题是相同的.我只想提取小时、分钟、秒和毫秒.
I have two different time series from different data frames with different irregular formats, but the problem is the same. I want to extract only hours, minutes, seconds and milliseconds.
时间序列如下所示:
ts1
08:27:23,445
08:27:24,280
08:27:25,115
...
我试过了
strptime("08:27:23,445", "%H:%M:%OS")
[1] "2013-05-23 08:27:23"
我丢失了毫秒信息并获得了无用的(对我而言)日期信息.
I lost the millisecond-information and get the useless (for me) date information.
ts2
Fri Apr 19 2013 08:39:41 GMT+0200
Fri Apr 19 2013 08:39:43 GMT+0200
Fri Apr 19 2013 08:39:45 GMT+0200
...
我试过了
strptime("Fri Apr 19 2013 08:39:41 GMT+0200", "%a %b %d %Y %H:%M:%S %Z")
[1] NA
最后,我想将 ts1 和 ts2 分别转换为具有相同格式(以毫秒为单位)的新时间序列,例如:
In the end, I want to transform ts1 and ts2 each into a new time series that have the same format (with milliseconds), for example:
ts1
08:27:23,445
ts2
08:39:41,000
同样的格式对我来说很重要,因为我想稍后用这两个时间序列进行操作.例如:匹配时间序列、计算差异等...
The same format is important for me, because I want to operate with the two time series later on. E.g.: Matching the time series, calculate differences, etc...
感谢您的帮助!
更新:添加 dput
两个数据集都非常长,这就是我试图将它们减少的原因.
Both datasets are very very long , thats why i tried to cut them down.
ts1
structure(list(t = structure(1:9, .Label = c("08:27:23,445",
"08:27:24,280", "08:27:25,115", "08:27:25,960", "08:27:26,780",
"08:27:27,540", "08:27:28,295", "08:27:29,075", "08:27:29,910"), class = "factor")), .Names = "t", row.names = c(NA, -9L
), class = "data.frame")
ts2
structure(list(t = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 6L, 7L,
8L), .Label = c("Fri Apr 19 2013 08:39:41 GMT+0200", "Fri Apr 19 2013 08:39:43 GMT+0200",
"Fri Apr 19 2013 08:39:45 GMT+0200", "Fri Apr 19 2013 08:39:49 GMT+0200",
"Fri Apr 19 2013 08:39:51 GMT+0200", "Fri Apr 19 2013 08:39:53 GMT+0200",
"Fri Apr 19 2013 08:39:59 GMT+0200", "Fri Apr 19 2013 08:40:05 GMT+0200",
"Fri Apr 19 2013 08:40:06 GMT+0200"
), class = "factor")), .Names = "t", row.names = c(NA, -9L), class = "data.frame")
推荐答案
下面是一个快速的 lapply 函数,如果您有一个设置为零的点,它可能会有所帮助.例如,如果您只想比较同一天从 0:00(午夜)到 23:59:99,999 的活动.如果是这样,您可以将时间转换为另一种形式(在我的示例中为分钟),并且您可以查看单个活动需要多长时间.
Below is a quick lapply function that might help, IF you have a set-zero point. For example, if you only want to compare activities from 0:00 (midnight) until 23:59:99,999 on the same day. If so, you can convert the time into another form (minutes in my example) and you can see how long, say, a single activity takes.
将您的示例用于 t1:
Using your example for t1:
制作时间向量(作为字符)
Make a vector of times (as characters)
time <- c("08:27:23,445",
"08:27:24,280",
"08:27:25,115")
将逗号改为冒号,以便于剥离
Change the comma to a colon, for ease of stripping
time.new <- gsub(",", ":", time)
计算十进制分钟
time.mins <- sapply(strsplit(as.character(time.new), ":"),
function(x) {
x<-as.numeric(x)
(x[1]*60+x[2]+(x[3]/60)+(x[4]/60000))
})
如果您对列进行 df 处理,结果如下所示:
The results looks like this, if you make a df of the columns:
> df <- cbind(time, time.mins)
> df
time time.mins
[1,] "08:27:23,445" "507.39075"
[2,] "08:27:24,280" "507.404666666667"
[3,] "08:27:25,115" "507.418583333333"
我想这可能对点击率之类的东西更有帮助,或者当您从不关心超过 24 小时的总间隔时.
I imagine this might be a bit more helpful for something like click-through rates, or when you don't ever care about a total gap of more than 24 hours.
这篇关于R:变换不规则时间串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!