下一条记录的索引 [英] Index of next occurring record

查看:90
本文介绍了下一条记录的索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个自行车轨迹的样本数据集。我的目标是找出访问B站之间的平均时间。

I have a sample dataset of the trajectory of one bike. My objective is to figure out, on average, the amount of time that lapses in between visits to station B.

到目前为止,我已经能够简单地对数据集进行排序

So far, I have been able to simply order the dataset with:

test[order(test$starttime, decreasing = FALSE),]

并找到 start_station end_station 等于B。

 which(test$start_station == 'B')
 which(test$end_station == 'B')

下一部分是我遇到麻烦的地方。为了计算自行车在B站之间的时间间隔,我们必须将 difftime()取为 start_station = B (自行车离开)和下一条发生记录,其中 end_station = B ,即使记录恰好位于同一行(请参阅第6行)。

The next part is where I run into trouble. In order to calculate the time that lapses in between when the bike is at Station B, we must take the difftime() between where start_station = "B" (bike leaves) and the next occurring record where end_station= "B", even if the record happens to be in the same row (see row 6).

使用下面的数据集,我们知道自行车在<$ c之间花费了510分钟B站外的$ c> 7:30:00 16:00:00 ,距 18 30分钟: B站外的00:00 18:30:00 ,以及 19:00:00 22:30:00 在车站B之外,平均时间为 250分钟。

Using the dataset below, we know that the bike spent 510 minutes between 7:30:00 and 16:00:00 outside of Station B, 30 minutes between 18:00:00 and 18:30:00 outside of Station B, and 210 minutes between 19:00:00 and 22:30:00 outside of Station B, which averages to 250 minutes.

如何使用 difftime()在R中重现此输出?

How would one reproduce this output in R using difftime()?

> test
   bikeid start_station           starttime end_station             endtime
1       1             A 2017-09-25 01:00:00           B 2017-09-25 01:30:00
2       1             B 2017-09-25 07:30:00           C 2017-09-25 08:00:00
3       1             C 2017-09-25 10:00:00           A 2017-09-25 10:30:00
4       1             A 2017-09-25 13:00:00           C 2017-09-25 13:30:00
5       1             C 2017-09-25 15:30:00           B 2017-09-25 16:00:00
6       1             B 2017-09-25 18:00:00           B 2017-09-25 18:30:00
7       1             B 2017-09-25 19:00:00           A 2017-09-25 19:30:00
8       1             А 2017-09-25 20:00:00           C 2017-09-25 20:30:00
9       1             C 2017-09-25 22:00:00           B 2017-09-25 22:30:00
10      1             B 2017-09-25 23:00:00           C 2017-09-25 23:30:00

以下是示例数据:

> dput(test)
structure(list(bikeid = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), start_station = c("A", 
"B", "C", "A", "C", "B", "B", "А", "C", "B"), starttime = structure(c(1506315600, 
1506339000, 1506348000, 1506358800, 1506367800, 1506376800, 1506380400, 
1506384000, 1506391200, 1506394800), class = c("POSIXct", "POSIXt"
), tzone = ""), end_station = c("B", "C", "A", "C", "B", "B", 
"A", "C", "B", "C"), endtime = structure(c(1506317400, 1506340800, 
1506349800, 1506360600, 1506369600, 1506378600, 1506382200, 1506385800, 
1506393000, 1506396600), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("bikeid", 
"start_station", "starttime", "end_station", "endtime"), row.names = c(NA, 
-10L), class = "data.frame")


推荐答案

这将按照它发生的顺序,但不会附加到 data.frame

This will calculate the difference as asked in the order it occurs, but does not append it to the data.frame

lapply(df1$starttime[df1$start_station == "B"], function(x, et) difftime(et[x < et][1], x, units = "mins"), et = df1$endtime[df1$end_station == "B"])

[[1]]
Time difference of 510 mins

[[2]]
Time difference of 30 mins

[[3]]
Time difference of 210 mins

[[4]]
Time difference of NA mins

要计算平均时间:

v1 <- sapply(df1$starttime[df1$start_station == "B"], function(x, et) difftime(et[x < et][1], x, units = "mins"), et = df1$endtime[df1$end_station == "B"])
mean(v1, na.rm = TRUE)

[1] 250

这篇关于下一条记录的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆