获取存储在单个列中的连续日期之间的时间 [英] Get the time between consecutive dates stored in a single column
本文介绍了获取存储在单个列中的连续日期之间的时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
当事件被存储为数据框中的日期列时,我试图弄清楚如何获取连续事件之间的时间.
I am trying to figure out how to get the time between consecutive events when events are stored as a column of dates in a dataframe.
sampledf=structure(list(cust = c(1L, 1L, 1L, 1L), date = structure(c(9862,
9879, 10075, 10207), class = "Date")), .Names = c("cust", "date"
), row.names = c(NA, -4L), class = "data.frame")
我可以得到答案
as.numeric(rev(rev(difftime(c(sampledf$date[-1],0),sampledf$date))[-1]))
# [1] 17 196 132
但是真的很丑.除此之外,我只知道如何排除向量中的第一项,而不是最后一项,所以我必须 rev() 两次才能删除最后一个值.
but it is really ugly. Among other things, I only know how to exclude the first item in a vector, but not the last so I have to rev() twice to drop the last value.
有没有更好的方法?
顺便说一下,我将使用 ddply 对每个客户 ID 的更大数据集执行此操作,因此该解决方案需要使用 ddply.
By the way, I will use ddply to do this to a larger set of data for each cust id, so the solution would need to work with ddply.
library(plyr)
ddply(sampledf,
c("cust"),
summarize,
daysBetween = as.numeric(rev(rev(difftime(c(date[-1],0),date))[-1]))
)
谢谢!
推荐答案
你在找这个吗?
as.numeric(diff(sampledf$date))
# [1] 17 196 132
要删除最后一个元素,请使用 head
:
To remove the last element, use head
:
head(as.numeric(diff(sampledf$date)), -1)
# [1] 17 196
require(plyr)
ddply(sampledf, .(cust), summarise, daysBetween = as.numeric(diff(date)))
# cust daysBetween
# 1 1 17
# 2 1 196
# 3 1 132
这篇关于获取存储在单个列中的连续日期之间的时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文