根据连续行之间的时差对数据帧中的组进行分组 [英] Group rows in data frame based on time difference between consecutive rows

查看:137
本文介绍了根据连续行之间的时差对数据帧中的组进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这样的数据框架

YEAR   MONTH  DAY  HOUR       LON      LAT

1860     10      3   13      -19.50   3.00          
1860     10      3   17      -19.50   4.00                          
1860     10      3   21      -19.50   5.00                          
1860     10      5   5       -20.50   6.00                          
1860     10      5   13      -21.50   7.00                          
1860     10      5   17      -21.50   8.00                          
1860     10      6   1       -22.50   9.00                          
1860     10      6   5       -22.50   10.00                         
1860     12      5   9       -22.50   -7.00                         
1860     12      5   18      -23.50   -8.00                         
1860     12      5   22      -23.50   -9.00                         
1860     12      6   6       -24.50   -10.00                                    
1860     12      6   10      -24.50   -11.00                            
1860     12      6   18      -24.50   -12.00    

什么我想要做的是计算临时关闭点的每个子集的插值线(例如连续点之间的时间差小于4天;在上面的例子中有2个子集:一个从1860-10-3到1860-10-6,另一个从1860-12-5到1860-12-6),然后创建一个额外的列与拟合相关系数关联与相应的子集内插行。

What I wold like to do is to calculate the interpolating line for every subset of temporally close points (e.g. temporal difference between consecutive points is less than 4 days; in the example above there are 2 subset: one from 1860-10-3 till 1860-10-6 and the other from 1860-12-5 till 1860-12-6) and then create an extra column with the fit correlation coefficient associate with the respective subset interpolating line.

问题是我不知道如何根据上述条件正确地对数据框进行子集。

The problem is that I don't know how to subset my data frame properly according to the criteria stated above.

推荐答案

这是另一种可能性,其中连续行之间的时间差小于4天的行。

Here is another possibility which groups rows where the time difference between consecutive rows is less than 4 days.

# create date variable
df$date <- with(df, as.Date(paste(YEAR, MONTH, DAY, sep = "-")))

# calculate succesive differences between dates
# and identify gaps larger than 4
df$gap <- c(0, diff(df$date) > 4)

# cumulative sum of 'gap' variable
df$group <- cumsum(df$gap) + 1

df    
#    YEAR MONTH DAY HOUR   LON LAT       date gap group
# 1  1860    10   3   13 -19.5   3 1860-10-03   0     1
# 2  1860    10   3   17 -19.5   4 1860-10-03   0     1
# 3  1860    10   3   21 -19.5   5 1860-10-03   0     1
# 4  1860    10   5    5 -20.5   6 1860-10-05   0     1
# 5  1860    10   5   13 -21.5   7 1860-10-05   0     1
# 6  1860    10   5   17 -21.5   8 1860-10-05   0     1
# 7  1860    10   6    1 -22.5   9 1860-10-06   0     1
# 8  1860    10   6    5 -22.5  10 1860-10-06   0     1
# 9  1860    12   5    9 -22.5  -7 1860-12-05   1     2
# 10 1860    12   5   18 -23.5  -8 1860-12-05   0     2
# 11 1860    12   5   22 -23.5  -9 1860-12-05   0     2
# 12 1860    12   6    6 -24.5 -10 1860-12-06   0     2
# 13 1860    12   6   10 -24.5 -11 1860-12-06   0     2
# 14 1860    12   6   18 -24.5 -12 1860-12-06   0     2






免责声明: diff & cumsum 部分是灵感来自这个问答:


Disclaimer: the diff & cumsum part is inspired by this Q&A: How to partition a vector into groups of regular, consecutive sequences?.

这篇关于根据连续行之间的时差对数据帧中的组进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆