如何按行中的时间间隔对时间序列进行子集化和提取 [英] How to subset and extract time series by time interval in row

查看:18
本文介绍了如何按行中的时间间隔对时间序列进行子集化和提取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在分析动物位置,要求每只动物的位置相距 60 分钟或更长时间.动物之间位置的时间差异并不重要.该数据集有一个动物 ID 列表以及每个位置的日期和时间,示例如下.

I am working on an analysis of animal locations that requires locations for each animal to be 60 minutes or greater apart. Time differences in locations among animals does not matter. The data set has a list of animal IDs and date and time of each location, example below.

例如,对于下面的动物 6,从 16:19 的位置开始,代码将遍历位置,直到找到距离 16:19 超过 60 分钟的位置.在这种情况下,它将是 17:36 的位置.然后,代码将从 17:36 位置开始查找下一个位置 (18:52) 60 分钟以上,依此类推.然后将彼此相距 60 分钟以上的每个位置提取到单独的数据帧中.

For example, for animal 6 below, starting at the 16:19 location, the code would iterate through locations until it finds a location that is 60+ minutes from 16:19. In this case it would be the 17:36 location. Then, the code would start from the 17:36 location to find the next location (18:52) 60+ minutes, and so on. Each of the locations 60+ minutes from each other would then be extracted to a separate dataframe.

我在 R 中编写了一个循环来对数据进行子集化,但是在计算位置是否为 60 分钟或更长时间时,代码没有考虑到日期的变化.

I have wrote a loop in R to subset the data, but having issue with the code not accounting for a change in date when calculating if locations are 60 minutes or greater.

我一直在探索 lubridate 包,它似乎有一种更简单的方法来解决我的数据的子集化问题.但是,我还没有找到使用这个包将数据子集到我的规范的解决方案.任何有关使用 lubridate 或替代方法的建议将不胜感激.

I have been exploring the lubridate package, which seems like it may have an easier way to address subsetting my data. However, I have not yet found a solution to subsetting the data to my specifications using this package. Any suggestions for using lubridate or an alternative method would be greatly appreciated.

提前感谢您的考虑.

>data(locdata);
>view(locdata);
id  date    time
6   30-Jun-09   16:19
6   30-Jun-09   16:31
6   30-Jun-09   17:36
6   30-Jun-09   17:45
6   30-Jun-09   18:00
6   30-Jun-09   18:52
6   7-Aug-10    5:30
6   7-Aug-10    5:45
6   7-Aug-10    6:00
6   7-Aug-10    6:45
23  30-Jun-09   17:15
23  30-Jun-09   17:38
23  30-Jun-09   17:56
23  30-Jun-09   20:00
23  30-Jun-09   22:19
23  18-Jul-11   16:22
23  18-Jul-11   17:50
23  18-Jul-11   18:15

上面示例数据的输出如下所示:

The output from the example data above would look like this:

id  date    time
6   30-Jun-09   16:19
6   30-Jun-09   17:36
6   30-Jun-09   18:52
6   7-Aug-10    5:30
6   7-Aug-10    6:45
23  30-Jun-09   17:15
23  30-Jun-09   20:00
23  30-Jun-09   22:19
23  18-Jul-11   16:22
23  18-Jul-11   17:50

推荐答案

如果我理解正确,我认为您正在寻找以下方面的东西:

If I understood you correctly, I think you're looking for something along these lines:

library(dplyr)
library(lubridate)

locdata %>% 
    mutate(timestamp = dmy_hm(paste(date, time))) %>%
    group_by(id, date) %>%
    mutate(delta = timestamp - lag(timestamp))

如果您之前没有使用过 dplyrmagrittr,上面的语法可能不清楚.%>% 操作符将每次计算的结果传递给下一个函数,所以上面的代码做了以下事情:

If you haven't used dplyr or magrittr before, the syntax above may be unclear. The %>% operator passes the results of each computation to the next function, so the above code does the following:

  1. 使用 lubridate
  2. 将日期和时间解析为 R 可以理解的时间戳
  3. id 和唯一的 dates
  4. 对数据进行分组
  5. 在每个组内,计算观察之间的持续时间

如果要保存输出,请将第一行更改为 results <- locdata %>%.

If you want to save the output, change the first line to something like results <- locdata %>%.

根据您更新的问题和修改后的数据,我相信这是可行的:

locdata %>% 
    mutate(timestamp = dmy_hm(paste(date, time))) %>%
    group_by(id, date) %>%
    mutate(delta = timestamp - first(timestamp),
           steps = as.numeric(floor(delta / 3600)), 
           change = ifelse(is.na(steps - lag(steps)), 1, steps - lag(steps))) %>%
    filter(change > 0) %>%
    select(id, date, timestamp)

输出:

Source: local data frame [10 x 3]
Groups: id, date

   id      date           timestamp
1   6 30-Jun-09 2009-06-30 16:19:00
2   6 30-Jun-09 2009-06-30 17:36:00
3   6 30-Jun-09 2009-06-30 18:52:00
4   6  7-Aug-10 2010-08-07 05:30:00
5   6  7-Aug-10 2010-08-07 06:45:00
6  23 30-Jun-09 2009-06-30 17:15:00
7  23 30-Jun-09 2009-06-30 20:00:00
8  23 30-Jun-09 2009-06-30 22:19:00
9  23 18-Jul-11 2011-07-18 16:22:00
10 23 18-Jul-11 2011-07-18 17:50:00

它是如何工作的:

  1. 像以前一样创建时间戳
  2. iddate
  3. 对数据进行分组
  4. 计算每组中的第一个时间戳(即在给定日期对一只动物的第一次观察)与该组中的每个后续观察之间的增量(以秒为单位),并将其存储在新列中 delta
  5. 确定哪些观察(如果有)与第一个观察相比超过 3600 秒,增量为 3600 秒;将其存储在新列 steps
  6. 从第一次观察中确定哪些观察是一个或多个step(并保留第一次观察);将其存储在新列 change
  7. 仅保留 change 为 1 或更多的观察值 - 即观察值与上一次观察和组中的第一次观察相距一个或多个小时
  8. 只保留感兴趣的列
  1. Create timestamp as before
  2. Group the data by id and date
  3. Compute the delta in seconds between the first timestamp in each group (i.e. the first observation of one animal in a given day) and each subsequent observation in that group, store that in a new column delta
  4. Determine which observations (if any) are more than 3600 seconds from the first one, in increments of 3600 seconds; store that in a new column steps
  5. Determine which observations are one or more step from the first observation (and keep the first observation as well); store that in a new column change
  6. Keep only observations where change is 1 or more -- i.e. where the observation is one or more hours from the previous observation and from the first observation in the group
  7. Keep only the columns of interest

要熟悉它的工作原理,请从末尾删除 filterselect 并检查输出:

To get comfortable with how it works, drop the filter and select from the end and inspect the output:

Source: local data frame [18 x 7]
Groups: id, date

   id      date  time           timestamp      delta steps change
1   6 30-Jun-09 16:19 2009-06-30 16:19:00     0 secs     0      1
2   6 30-Jun-09 16:31 2009-06-30 16:31:00   720 secs     0      0
3   6 30-Jun-09 17:36 2009-06-30 17:36:00  4620 secs     1      1
4   6 30-Jun-09 17:45 2009-06-30 17:45:00  5160 secs     1      0
5   6 30-Jun-09 18:00 2009-06-30 18:00:00  6060 secs     1      0
6   6 30-Jun-09 18:52 2009-06-30 18:52:00  9180 secs     2      1
7   6  7-Aug-10  5:30 2010-08-07 05:30:00     0 secs     0      1
8   6  7-Aug-10  5:45 2010-08-07 05:45:00   900 secs     0      0
9   6  7-Aug-10  6:00 2010-08-07 06:00:00  1800 secs     0      0
10  6  7-Aug-10  6:45 2010-08-07 06:45:00  4500 secs     1      1
11 23 30-Jun-09 17:15 2009-06-30 17:15:00     0 secs     0      1
12 23 30-Jun-09 17:38 2009-06-30 17:38:00  1380 secs     0      0
13 23 30-Jun-09 17:56 2009-06-30 17:56:00  2460 secs     0      0
14 23 30-Jun-09 20:00 2009-06-30 20:00:00  9900 secs     2      2
15 23 30-Jun-09 22:19 2009-06-30 22:19:00 18240 secs     5      3
16 23 18-Jul-11 16:22 2011-07-18 16:22:00     0 secs     0      1
17 23 18-Jul-11 17:50 2011-07-18 17:50:00  5280 secs     1      1
18 23 18-Jul-11 18:15 2011-07-18 18:15:00  6780 secs     1      0

这篇关于如何按行中的时间间隔对时间序列进行子集化和提取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆