如何按时间间隔在行中子集和提取时间序列 [英] How to subset and extract time series by time interval in row
问题描述
我正在分析动物的位置,这要求每只动物的位置相隔60分钟或更长时间.动物之间位置的时间差异并不重要.数据集包含动物ID列表以及每个位置的日期和时间,例如以下示例.
I am working on an analysis of animal locations that requires locations for each animal to be 60 minutes or greater apart. Time differences in locations among animals does not matter. The data set has a list of animal IDs and date and time of each location, example below.
例如,对于下面的动物6,从16:19的位置开始,代码将遍历各个位置,直到找到距离16:19超过60分钟的位置.在这种情况下,它将是17:36的位置.然后,代码将从17:36位置开始查找60分钟以上的下一个位置(18:52),依此类推.彼此相距60分钟以上的每个位置都将被提取到一个单独的数据框中.
For example, for animal 6 below, starting at the 16:19 location, the code would iterate through locations until it finds a location that is 60+ minutes from 16:19. In this case it would be the 17:36 location. Then, the code would start from the 17:36 location to find the next location (18:52) 60+ minutes, and so on. Each of the locations 60+ minutes from each other would then be extracted to a separate dataframe.
我已经在R中编写了一个循环以对数据进行子集处理,但是在计算位置是否大于等于60分钟时代码没有考虑到日期变化的问题.
I have wrote a loop in R to subset the data, but having issue with the code not accounting for a change in date when calculating if locations are 60 minutes or greater.
我一直在研究lubridate程序包,看来它可能具有解决子集我的数据的更简单方法.但是,我还没有找到使用此软件包将数据子集化为我的规范的解决方案.任何使用lubridate或其他方法的建议将不胜感激.
I have been exploring the lubridate package, which seems like it may have an easier way to address subsetting my data. However, I have not yet found a solution to subsetting the data to my specifications using this package. Any suggestions for using lubridate or an alternative method would be greatly appreciated.
预先感谢您的考虑.
>data(locdata);
>view(locdata);
id date time
6 30-Jun-09 16:19
6 30-Jun-09 16:31
6 30-Jun-09 17:36
6 30-Jun-09 17:45
6 30-Jun-09 18:00
6 30-Jun-09 18:52
6 7-Aug-10 5:30
6 7-Aug-10 5:45
6 7-Aug-10 6:00
6 7-Aug-10 6:45
23 30-Jun-09 17:15
23 30-Jun-09 17:38
23 30-Jun-09 17:56
23 30-Jun-09 20:00
23 30-Jun-09 22:19
23 18-Jul-11 16:22
23 18-Jul-11 17:50
23 18-Jul-11 18:15
上面示例数据的输出如下所示:
The output from the example data above would look like this:
id date time
6 30-Jun-09 16:19
6 30-Jun-09 17:36
6 30-Jun-09 18:52
6 7-Aug-10 5:30
6 7-Aug-10 6:45
23 30-Jun-09 17:15
23 30-Jun-09 20:00
23 30-Jun-09 22:19
23 18-Jul-11 16:22
23 18-Jul-11 17:50
推荐答案
如果我对您的理解正确,我认为您正在寻找符合以下要求的东西:
If I understood you correctly, I think you're looking for something along these lines:
library(dplyr)
library(lubridate)
locdata %>%
mutate(timestamp = dmy_hm(paste(date, time))) %>%
group_by(id, date) %>%
mutate(delta = timestamp - lag(timestamp))
如果您以前没有使用过dplyr
或magrittr
,则上面的语法可能不清楚. %>%
运算符将每次计算的结果传递给下一个函数,因此上述代码执行以下操作:
If you haven't used dplyr
or magrittr
before, the syntax above may be unclear. The %>%
operator passes the results of each computation to the next function, so the above code does the following:
- 使用
lubridate
将日期和时间解析为R可以理解的时间戳
- 按
id
和唯一的date
s 对数据进行分组
- 在每个组中,计算两次观察之间的持续时间
- Parse the date and time into a timestamp that R understands, using
lubridate
- Group the data by
id
and uniquedate
s - Within each group, calculate the duration between observations
如果要保存输出,请将第一行更改为results <- locdata %>%
.
If you want to save the output, change the first line to something like results <- locdata %>%
.
根据您的最新问题和修改后的数据,我认为这可行:
locdata %>%
mutate(timestamp = dmy_hm(paste(date, time))) %>%
group_by(id, date) %>%
mutate(delta = timestamp - first(timestamp),
steps = as.numeric(floor(delta / 3600)),
change = ifelse(is.na(steps - lag(steps)), 1, steps - lag(steps))) %>%
filter(change > 0) %>%
select(id, date, timestamp)
输出:
Source: local data frame [10 x 3]
Groups: id, date
id date timestamp
1 6 30-Jun-09 2009-06-30 16:19:00
2 6 30-Jun-09 2009-06-30 17:36:00
3 6 30-Jun-09 2009-06-30 18:52:00
4 6 7-Aug-10 2010-08-07 05:30:00
5 6 7-Aug-10 2010-08-07 06:45:00
6 23 30-Jun-09 2009-06-30 17:15:00
7 23 30-Jun-09 2009-06-30 20:00:00
8 23 30-Jun-09 2009-06-30 22:19:00
9 23 18-Jul-11 2011-07-18 16:22:00
10 23 18-Jul-11 2011-07-18 17:50:00
工作原理:
- 像以前一样创建
timestamp
- 按
id
和date
对数据进行分组
- 计算每组中第一个时间戳(即给定一天中对一只动物的第一次观察)与该组中每个后续观察之间的秒差,将其存储在新列
delta
中
- 确定哪些观测值(如果有的话)距第一个观测值超过3600秒,以3600秒为增量;将其存储在新列
steps
中
- 从第一个观察结果确定哪些观察结果是一个或多个(c12)(并同时保留第一个观察结果);将其存储在新列
change
中
- 仅保留
change
为1或更大的观测值-即距上一个观测值和该组中第一个观测值相距一个或多个小时的观测值 - 仅保留感兴趣的列
- Create
timestamp
as before - Group the data by
id
anddate
- Compute the delta in seconds between the first timestamp in each group (i.e. the first observation of one animal in a given day) and each subsequent observation in that group, store that in a new column
delta
- Determine which observations (if any) are more than 3600 seconds from the first one, in increments of 3600 seconds; store that in a new column
steps
- Determine which observations are one or more
step
from the first observation (and keep the first observation as well); store that in a new columnchange
- Keep only observations where
change
is 1 or more -- i.e. where the observation is one or more hours from the previous observation and from the first observation in the group - Keep only the columns of interest
要熟悉其工作原理,请将filter
和select
从末端放下并检查输出:
To get comfortable with how it works, drop the filter
and select
from the end and inspect the output:
Source: local data frame [18 x 7]
Groups: id, date
id date time timestamp delta steps change
1 6 30-Jun-09 16:19 2009-06-30 16:19:00 0 secs 0 1
2 6 30-Jun-09 16:31 2009-06-30 16:31:00 720 secs 0 0
3 6 30-Jun-09 17:36 2009-06-30 17:36:00 4620 secs 1 1
4 6 30-Jun-09 17:45 2009-06-30 17:45:00 5160 secs 1 0
5 6 30-Jun-09 18:00 2009-06-30 18:00:00 6060 secs 1 0
6 6 30-Jun-09 18:52 2009-06-30 18:52:00 9180 secs 2 1
7 6 7-Aug-10 5:30 2010-08-07 05:30:00 0 secs 0 1
8 6 7-Aug-10 5:45 2010-08-07 05:45:00 900 secs 0 0
9 6 7-Aug-10 6:00 2010-08-07 06:00:00 1800 secs 0 0
10 6 7-Aug-10 6:45 2010-08-07 06:45:00 4500 secs 1 1
11 23 30-Jun-09 17:15 2009-06-30 17:15:00 0 secs 0 1
12 23 30-Jun-09 17:38 2009-06-30 17:38:00 1380 secs 0 0
13 23 30-Jun-09 17:56 2009-06-30 17:56:00 2460 secs 0 0
14 23 30-Jun-09 20:00 2009-06-30 20:00:00 9900 secs 2 2
15 23 30-Jun-09 22:19 2009-06-30 22:19:00 18240 secs 5 3
16 23 18-Jul-11 16:22 2011-07-18 16:22:00 0 secs 0 1
17 23 18-Jul-11 17:50 2011-07-18 17:50:00 5280 secs 1 1
18 23 18-Jul-11 18:15 2011-07-18 18:15:00 6780 secs 1 0
这篇关于如何按时间间隔在行中子集和提取时间序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!