下一条记录的索引 [英] Index of next occurring record
问题描述
我有一个自行车轨迹的样本数据集。我的目标是找出访问B站之间的平均时间。
I have a sample dataset of the trajectory of one bike. My objective is to figure out, on average, the amount of time that lapses in between visits to station B.
到目前为止,我已经能够简单地对数据集进行排序
So far, I have been able to simply order the dataset with:
test[order(test$starttime, decreasing = FALSE),]
并找到 start_station
和 end_station
等于B。
which(test$start_station == 'B')
which(test$end_station == 'B')
下一部分是我遇到麻烦的地方。为了计算自行车在B站之间的时间间隔,我们必须将 difftime()
取为 start_station = B
(自行车离开)和下一条发生记录,其中 end_station = B
,即使记录恰好位于同一行(请参阅第6行)。
The next part is where I run into trouble. In order to calculate the time that lapses in between when the bike is at Station B, we must take the difftime()
between where start_station = "B"
(bike leaves) and the next occurring record where end_station= "B"
, even if the record happens to be in the same row (see row 6).
使用下面的数据集,我们知道自行车在<$ c之间花费了510分钟B站外的$ c> 7:30:00 和 16:00:00
,距 18 30分钟: B站外的00:00
和 18:30:00
,以及 19:00:00
和 22:30:00
在车站B之外,平均时间为 250分钟。
Using the dataset below, we know that the bike spent 510 minutes between 7:30:00
and 16:00:00
outside of Station B, 30 minutes between 18:00:00
and 18:30:00
outside of Station B, and 210 minutes between 19:00:00
and 22:30:00
outside of Station B, which averages to 250 minutes.
如何使用 difftime()
在R中重现此输出?
How would one reproduce this output in R using difftime()
?
> test
bikeid start_station starttime end_station endtime
1 1 A 2017-09-25 01:00:00 B 2017-09-25 01:30:00
2 1 B 2017-09-25 07:30:00 C 2017-09-25 08:00:00
3 1 C 2017-09-25 10:00:00 A 2017-09-25 10:30:00
4 1 A 2017-09-25 13:00:00 C 2017-09-25 13:30:00
5 1 C 2017-09-25 15:30:00 B 2017-09-25 16:00:00
6 1 B 2017-09-25 18:00:00 B 2017-09-25 18:30:00
7 1 B 2017-09-25 19:00:00 A 2017-09-25 19:30:00
8 1 А 2017-09-25 20:00:00 C 2017-09-25 20:30:00
9 1 C 2017-09-25 22:00:00 B 2017-09-25 22:30:00
10 1 B 2017-09-25 23:00:00 C 2017-09-25 23:30:00
以下是示例数据:
> dput(test)
structure(list(bikeid = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), start_station = c("A",
"B", "C", "A", "C", "B", "B", "А", "C", "B"), starttime = structure(c(1506315600,
1506339000, 1506348000, 1506358800, 1506367800, 1506376800, 1506380400,
1506384000, 1506391200, 1506394800), class = c("POSIXct", "POSIXt"
), tzone = ""), end_station = c("B", "C", "A", "C", "B", "B",
"A", "C", "B", "C"), endtime = structure(c(1506317400, 1506340800,
1506349800, 1506360600, 1506369600, 1506378600, 1506382200, 1506385800,
1506393000, 1506396600), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("bikeid",
"start_station", "starttime", "end_station", "endtime"), row.names = c(NA,
-10L), class = "data.frame")
推荐答案
这将按照它发生的顺序,但不会附加到 data.frame
This will calculate the difference as asked in the order it occurs, but does not append it to the data.frame
lapply(df1$starttime[df1$start_station == "B"], function(x, et) difftime(et[x < et][1], x, units = "mins"), et = df1$endtime[df1$end_station == "B"])
[[1]]
Time difference of 510 mins
[[2]]
Time difference of 30 mins
[[3]]
Time difference of 210 mins
[[4]]
Time difference of NA mins
要计算平均时间:
v1 <- sapply(df1$starttime[df1$start_station == "B"], function(x, et) difftime(et[x < et][1], x, units = "mins"), et = df1$endtime[df1$end_station == "B"])
mean(v1, na.rm = TRUE)
[1] 250
这篇关于下一条记录的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!