R:变换不规则时间串 [英] R: transform irregular time strings

查看:36
本文介绍了R:变换不规则时间串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有来自不同数据帧的两个不同时间序列,具有不同的不规则格式,但问题是相同的.我只想提取小时、分钟、秒和毫秒.

I have two different time series from different data frames with different irregular formats, but the problem is the same. I want to extract only hours, minutes, seconds and milliseconds.

时间序列如下所示:

ts1

08:27:23,445
08:27:24,280
08:27:25,115
...

我试过了

strptime("08:27:23,445", "%H:%M:%OS")
[1] "2013-05-23 08:27:23"

我丢失了毫秒信息并获得了无用的(对我而言)日期信息.

I lost the millisecond-information and get the useless (for me) date information.

ts2

Fri Apr 19 2013 08:39:41 GMT+0200
Fri Apr 19 2013 08:39:43 GMT+0200
Fri Apr 19 2013 08:39:45 GMT+0200
...

我试过了

strptime("Fri Apr 19 2013 08:39:41 GMT+0200", "%a %b %d %Y %H:%M:%S %Z")
[1] NA

最后,我想将 ts1 和 ts2 分别转换为具有相同格式(以毫秒为单位)的新时间序列,例如:

In the end, I want to transform ts1 and ts2 each into a new time series that have the same format (with milliseconds), for example:

ts1

08:27:23,445

ts2

08:39:41,000

同样的格式对我来说很重要,因为我想稍后用这两个时间序列进行操作.例如:匹配时间序列、计算差异等...

The same format is important for me, because I want to operate with the two time series later on. E.g.: Matching the time series, calculate differences, etc...

感谢您的帮助!

更新:添加 dput

两个数据集都非常长,这就是我试图将它们减少的原因.

Both datasets are very very long , thats why i tried to cut them down.

ts1

structure(list(t = structure(1:9, .Label = c("08:27:23,445", 
                                                   "08:27:24,280", "08:27:25,115", "08:27:25,960", "08:27:26,780", 
                                                   "08:27:27,540", "08:27:28,295", "08:27:29,075", "08:27:29,910"), class = "factor")), .Names = "t", row.names = c(NA, -9L
                                                   ), class = "data.frame")

ts2

structure(list(t = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 6L, 7L, 
                           8L), .Label = c("Fri Apr 19 2013 08:39:41 GMT+0200", "Fri Apr 19 2013 08:39:43 GMT+0200", 
                                           "Fri Apr 19 2013 08:39:45 GMT+0200", "Fri Apr 19 2013 08:39:49 GMT+0200", 
                                           "Fri Apr 19 2013 08:39:51 GMT+0200", "Fri Apr 19 2013 08:39:53 GMT+0200", 
                                           "Fri Apr 19 2013 08:39:59 GMT+0200", "Fri Apr 19 2013 08:40:05 GMT+0200", 
                                           "Fri Apr 19 2013 08:40:06 GMT+0200"
                           ), class = "factor")), .Names = "t", row.names = c(NA, -9L), class = "data.frame")

推荐答案

下面是一个快速的 lapply 函数,如果您有一个设置为零的点,它可能会有所帮助.例如,如果您只想比较同一天从 0:00(午夜)到 23:59:99,999 的活动.如果是这样,您可以将时间转换为另一种形式(在我的示例中为分钟),并且您可以查看单个活动需要多长时间.

Below is a quick lapply function that might help, IF you have a set-zero point. For example, if you only want to compare activities from 0:00 (midnight) until 23:59:99,999 on the same day. If so, you can convert the time into another form (minutes in my example) and you can see how long, say, a single activity takes.

将您的示例用于 t1:

Using your example for t1:

制作时间向量(作为字符)

Make a vector of times (as characters)

time <- c("08:27:23,445",
          "08:27:24,280",
          "08:27:25,115")

将逗号改为冒号,以便于剥离

Change the comma to a colon, for ease of stripping

time.new <- gsub(",", ":", time)

计算十进制分钟

time.mins <- sapply(strsplit(as.character(time.new), ":"),
                    function(x) {
                      x<-as.numeric(x)
                      (x[1]*60+x[2]+(x[3]/60)+(x[4]/60000))
                    })

如果您对列进行 df 处理,结果如下所示:

The results looks like this, if you make a df of the columns:

> df <- cbind(time, time.mins)
> df
     time           time.mins         
[1,] "08:27:23,445" "507.39075"       
[2,] "08:27:24,280" "507.404666666667"
[3,] "08:27:25,115" "507.418583333333"

我想这可能对点击率之类的东西更有帮助,或者当您从不关心超过 24 小时的总间隔时.

I imagine this might be a bit more helpful for something like click-through rates, or when you don't ever care about a total gap of more than 24 hours.

这篇关于R:变换不规则时间串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆