根据连续行之间的时差对数据帧中的组进行分组 [英] Group rows in data frame based on time difference between consecutive rows
问题描述
我有这样的数据框架
YEAR MONTH DAY HOUR LON LAT
1860 10 3 13 -19.50 3.00
1860 10 3 17 -19.50 4.00
1860 10 3 21 -19.50 5.00
1860 10 5 5 -20.50 6.00
1860 10 5 13 -21.50 7.00
1860 10 5 17 -21.50 8.00
1860 10 6 1 -22.50 9.00
1860 10 6 5 -22.50 10.00
1860 12 5 9 -22.50 -7.00
1860 12 5 18 -23.50 -8.00
1860 12 5 22 -23.50 -9.00
1860 12 6 6 -24.50 -10.00
1860 12 6 10 -24.50 -11.00
1860 12 6 18 -24.50 -12.00
什么我想要做的是计算临时关闭点的每个子集的插值线(例如连续点之间的时间差小于4天;在上面的例子中有2个子集:一个从1860-10-3到1860-10-6,另一个从1860-12-5到1860-12-6),然后创建一个额外的列与拟合相关系数关联与相应的子集内插行。
What I wold like to do is to calculate the interpolating line for every subset of temporally close points (e.g. temporal difference between consecutive points is less than 4 days; in the example above there are 2 subset: one from 1860-10-3 till 1860-10-6 and the other from 1860-12-5 till 1860-12-6) and then create an extra column with the fit correlation coefficient associate with the respective subset interpolating line.
问题是我不知道如何根据上述条件正确地对数据框进行子集。
The problem is that I don't know how to subset my data frame properly according to the criteria stated above.
推荐答案
这是另一种可能性,其中连续行之间的时间差小于4天的行。
Here is another possibility which groups rows where the time difference between consecutive rows is less than 4 days.
# create date variable
df$date <- with(df, as.Date(paste(YEAR, MONTH, DAY, sep = "-")))
# calculate succesive differences between dates
# and identify gaps larger than 4
df$gap <- c(0, diff(df$date) > 4)
# cumulative sum of 'gap' variable
df$group <- cumsum(df$gap) + 1
df
# YEAR MONTH DAY HOUR LON LAT date gap group
# 1 1860 10 3 13 -19.5 3 1860-10-03 0 1
# 2 1860 10 3 17 -19.5 4 1860-10-03 0 1
# 3 1860 10 3 21 -19.5 5 1860-10-03 0 1
# 4 1860 10 5 5 -20.5 6 1860-10-05 0 1
# 5 1860 10 5 13 -21.5 7 1860-10-05 0 1
# 6 1860 10 5 17 -21.5 8 1860-10-05 0 1
# 7 1860 10 6 1 -22.5 9 1860-10-06 0 1
# 8 1860 10 6 5 -22.5 10 1860-10-06 0 1
# 9 1860 12 5 9 -22.5 -7 1860-12-05 1 2
# 10 1860 12 5 18 -23.5 -8 1860-12-05 0 2
# 11 1860 12 5 22 -23.5 -9 1860-12-05 0 2
# 12 1860 12 6 6 -24.5 -10 1860-12-06 0 2
# 13 1860 12 6 10 -24.5 -11 1860-12-06 0 2
# 14 1860 12 6 18 -24.5 -12 1860-12-06 0 2
免责声明: diff
&
cumsum
部分是灵感来自这个问答:?
Disclaimer: the diff
& cumsum
part is inspired by this Q&A: How to partition a vector into groups of regular, consecutive sequences?.
这篇关于根据连续行之间的时差对数据帧中的组进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!