如何按组计算与数据前一行的时差 [英] How to calculate time difference with previous row of a data.frame by group

查看:58
本文介绍了如何按组计算与数据前一行的时差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要解决的问题是我有一个带有排序POSIXct变量的数据框。每行都是分类的,我想获取每个级别的每行之间的时间差,并将该数据重新添加到新变量中。可再现的问题如下。
下面的函数仅用于创建具有随机时间
的示例数据。

The problem I am trying to solve is that I have a data frame with a sorted POSIXct variable in it. Each row is categorized and I want to get the time differences between each row for each level and add that data back into a new variable. The reproducible problem is as below. The below function is just for creating sample data with random times for the purpose of this question.

random.time <- function(N, start, end) {
  st <- as.POSIXct(start)
  en <- as.POSIXct(end)
  dt <- as.numeric(difftime(en, st, unit="sec"))
  ev <- sort(runif(N, 0, dt))
  rt <- st + ev
  return(rt)
}

用于模拟问题的代码如下:

The code for simulating the problem is as below:

set.seed(123)
category <- sample(LETTERS[1:5], 20, replace=TRUE)
randtime <- random.time(20, '2015/06/01 08:00:00', '2015/06/01 18:00:00')
df <- data.frame(category, randtime)

预期的结果数据帧如下:

The expected resulting data frame is as below:

>category randtime timediff (secs)
>A  2015-06-01 09:05:00 0
>A  2015-06-01 09:06:30 90
>A  2015-06-01 09:10:00 210
>B  2015-06-01 10:18:58 0
>B  2015-06-01 10:19:58 60
>C  2015-06-01 08:14:00 0
>C  2015-06-01 08:16:30 150

输出中的每个子组将具有timediff值为0的第一行,因为没有前一行。我能够按类别分组并调用以下函数来计算差异,但无法获取所有类别分组的最终输出。

Each subgroup in the output will have the first row with timediff value of 0 as there is no previous row. I was able to group by category and call the following function to calculate the differences but could not get it to collate the final output for all category groups.

getTimeDiff <- function(x) {
  no_rows <- nrow(x)
  if(no_rows > 1) {
    for(i in 2:no_rows) {
      t <- x[i, "randtime"] - x[i-1, "randtime"]
    }
  }
}

我已经在这里住了两天,没有运气,所以非常感谢您的帮助。
谢谢。

I have been at this for two days now without luck so would greatly appreciate any help. Thanks.

推荐答案

尝试一下:

library(dplyr)
df %>%
  arrange(category, randtime) %>%
  group_by(category) %>%
  mutate(diff = randtime - lag(randtime),
         diff_secs = as.numeric(diff, units = 'secs'))

#   category            randtime             diff   diff_secs
#     (fctr)              (time)           (dfft)       (dbl)
# 1        A 2015-06-01 11:10:54         NA hours          NA
# 2        A 2015-06-01 15:35:04   4.402785 hours   15850.027
# 3        A 2015-06-01 17:01:22   1.438395 hours    5178.222
# 4        B 2015-06-01 08:14:46         NA hours          NA
# 5        B 2015-06-01 16:53:43 518.955379 hours 1868239.364
# 6        B 2015-06-01 17:37:48  44.090950 hours  158727.420

您可能还想在链中添加 replace(is.na(。),0)

You may also want to add replace(is.na(.), 0) to the chain.

这篇关于如何按组计算与数据前一行的时差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆