如何按时间间隔在行中子集和提取时间序列 [英] How to subset and extract time series by time interval in row

查看:82
本文介绍了如何按时间间隔在行中子集和提取时间序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在分析动物的位置,这要求每只动物的位置相隔60分钟或更长时间.动物之间位置的时间差异并不重要.数据集包含动物ID列表以及每个位置的日期和时间,例如以下示例.

I am working on an analysis of animal locations that requires locations for each animal to be 60 minutes or greater apart. Time differences in locations among animals does not matter. The data set has a list of animal IDs and date and time of each location, example below.

例如,对于下面的动物6,从16:19的位置开始,代码将遍历各个位置,直到找到距离16:19超过60分钟的位置.在这种情况下,它将是17:36的位置.然后,代码将从17:36位置开始查找60分钟以上的下一个位置(18:52),依此类推.彼此相距60分钟以上的每个位置都将被提取到一个单独的数据框中.

For example, for animal 6 below, starting at the 16:19 location, the code would iterate through locations until it finds a location that is 60+ minutes from 16:19. In this case it would be the 17:36 location. Then, the code would start from the 17:36 location to find the next location (18:52) 60+ minutes, and so on. Each of the locations 60+ minutes from each other would then be extracted to a separate dataframe.

我已经在R中编写了一个循环以对数据进行子集处理,但是在计算位置是否大于等于60分钟时代码没有考虑到日期变化的问题.

I have wrote a loop in R to subset the data, but having issue with the code not accounting for a change in date when calculating if locations are 60 minutes or greater.

我一直在研究lubridate程序包,看来它可能具有解决子集我的数据的更简单方法.但是,我还没有找到使用此软件包将数据子集化为我的规范的解决方案.任何使用lubridate或其他方法的建议将不胜感激.

I have been exploring the lubridate package, which seems like it may have an easier way to address subsetting my data. However, I have not yet found a solution to subsetting the data to my specifications using this package. Any suggestions for using lubridate or an alternative method would be greatly appreciated.

预先感谢您的考虑.

>data(locdata);
>view(locdata);
id  date    time
6   30-Jun-09   16:19
6   30-Jun-09   16:31
6   30-Jun-09   17:36
6   30-Jun-09   17:45
6   30-Jun-09   18:00
6   30-Jun-09   18:52
6   7-Aug-10    5:30
6   7-Aug-10    5:45
6   7-Aug-10    6:00
6   7-Aug-10    6:45
23  30-Jun-09   17:15
23  30-Jun-09   17:38
23  30-Jun-09   17:56
23  30-Jun-09   20:00
23  30-Jun-09   22:19
23  18-Jul-11   16:22
23  18-Jul-11   17:50
23  18-Jul-11   18:15

上面示例数据的输出如下所示:

The output from the example data above would look like this:

id  date    time
6   30-Jun-09   16:19
6   30-Jun-09   17:36
6   30-Jun-09   18:52
6   7-Aug-10    5:30
6   7-Aug-10    6:45
23  30-Jun-09   17:15
23  30-Jun-09   20:00
23  30-Jun-09   22:19
23  18-Jul-11   16:22
23  18-Jul-11   17:50

推荐答案

如果我对您的理解正确,我认为您正在寻找符合以下要求的东西:

If I understood you correctly, I think you're looking for something along these lines:

library(dplyr)
library(lubridate)

locdata %>% 
    mutate(timestamp = dmy_hm(paste(date, time))) %>%
    group_by(id, date) %>%
    mutate(delta = timestamp - lag(timestamp))

如果您以前没有使用过dplyrmagrittr,则上面的语法可能不清楚. %>%运算符将每次计算的结果传递给下一个函数,因此上述代码执行以下操作:

If you haven't used dplyr or magrittr before, the syntax above may be unclear. The %>% operator passes the results of each computation to the next function, so the above code does the following:

  1. 使用lubridate
  2. 将日期和时间解析为R可以理解的时间戳
  3. id和唯一的date s
  4. 对数据进行分组
  5. 在每个组中,计算两次观察之间的持续时间
  1. Parse the date and time into a timestamp that R understands, using lubridate
  2. Group the data by id and unique dates
  3. Within each group, calculate the duration between observations

如果要保存输出,请将第一行更改为results <- locdata %>%.

If you want to save the output, change the first line to something like results <- locdata %>%.

根据您的最新问题和修改后的数据,我认为这可行:

locdata %>% 
    mutate(timestamp = dmy_hm(paste(date, time))) %>%
    group_by(id, date) %>%
    mutate(delta = timestamp - first(timestamp),
           steps = as.numeric(floor(delta / 3600)), 
           change = ifelse(is.na(steps - lag(steps)), 1, steps - lag(steps))) %>%
    filter(change > 0) %>%
    select(id, date, timestamp)

输出:

Source: local data frame [10 x 3]
Groups: id, date

   id      date           timestamp
1   6 30-Jun-09 2009-06-30 16:19:00
2   6 30-Jun-09 2009-06-30 17:36:00
3   6 30-Jun-09 2009-06-30 18:52:00
4   6  7-Aug-10 2010-08-07 05:30:00
5   6  7-Aug-10 2010-08-07 06:45:00
6  23 30-Jun-09 2009-06-30 17:15:00
7  23 30-Jun-09 2009-06-30 20:00:00
8  23 30-Jun-09 2009-06-30 22:19:00
9  23 18-Jul-11 2011-07-18 16:22:00
10 23 18-Jul-11 2011-07-18 17:50:00

工作原理:

  1. 像以前一样创建timestamp
  2. iddate
  3. 对数据进行分组
  4. 计算每组中第一个时间戳(即给定一天中对一只动物的第一次观察)与该组中每个后续观察之间的秒差,将其存储在新列delta
  5. 确定哪些观测值(如果有的话)距第一个观测值超过3600秒,以3600秒为增量;将其存储在新列steps
  6. 从第一个观察结果确定哪些观察结果是一个或多个(c12)(并同时保留第一个观察结果);将其存储在新列change
  7. 仅保留change为1或更大的观测值-即距上一个观测值和该组中第一个观测值相距一个或多个小时的观测值
  8. 仅保留感兴趣的列
  1. Create timestamp as before
  2. Group the data by id and date
  3. Compute the delta in seconds between the first timestamp in each group (i.e. the first observation of one animal in a given day) and each subsequent observation in that group, store that in a new column delta
  4. Determine which observations (if any) are more than 3600 seconds from the first one, in increments of 3600 seconds; store that in a new column steps
  5. Determine which observations are one or more step from the first observation (and keep the first observation as well); store that in a new column change
  6. Keep only observations where change is 1 or more -- i.e. where the observation is one or more hours from the previous observation and from the first observation in the group
  7. Keep only the columns of interest

要熟悉其工作原理,请将filterselect从末端放下并检查输出:

To get comfortable with how it works, drop the filter and select from the end and inspect the output:

Source: local data frame [18 x 7]
Groups: id, date

   id      date  time           timestamp      delta steps change
1   6 30-Jun-09 16:19 2009-06-30 16:19:00     0 secs     0      1
2   6 30-Jun-09 16:31 2009-06-30 16:31:00   720 secs     0      0
3   6 30-Jun-09 17:36 2009-06-30 17:36:00  4620 secs     1      1
4   6 30-Jun-09 17:45 2009-06-30 17:45:00  5160 secs     1      0
5   6 30-Jun-09 18:00 2009-06-30 18:00:00  6060 secs     1      0
6   6 30-Jun-09 18:52 2009-06-30 18:52:00  9180 secs     2      1
7   6  7-Aug-10  5:30 2010-08-07 05:30:00     0 secs     0      1
8   6  7-Aug-10  5:45 2010-08-07 05:45:00   900 secs     0      0
9   6  7-Aug-10  6:00 2010-08-07 06:00:00  1800 secs     0      0
10  6  7-Aug-10  6:45 2010-08-07 06:45:00  4500 secs     1      1
11 23 30-Jun-09 17:15 2009-06-30 17:15:00     0 secs     0      1
12 23 30-Jun-09 17:38 2009-06-30 17:38:00  1380 secs     0      0
13 23 30-Jun-09 17:56 2009-06-30 17:56:00  2460 secs     0      0
14 23 30-Jun-09 20:00 2009-06-30 20:00:00  9900 secs     2      2
15 23 30-Jun-09 22:19 2009-06-30 22:19:00 18240 secs     5      3
16 23 18-Jul-11 16:22 2011-07-18 16:22:00     0 secs     0      1
17 23 18-Jul-11 17:50 2011-07-18 17:50:00  5280 secs     1      1
18 23 18-Jul-11 18:15 2011-07-18 18:15:00  6780 secs     1      0

这篇关于如何按时间间隔在行中子集和提取时间序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆