data.frame列中至少连续五年的子集 [英] Subset where there are at least five consecutive years in a data.frame column

查看：75 发布时间：2020/10/15 19:29:14 r dataframe data.table

本文介绍了data.frame列中至少连续五年的子集的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在R中有一个data.frame / data.table，如下所示：

I have a data.frame / data.table in R as follows:

df <- data.frame(
  ID = c(rep("A", 20)),
  year = c(1968, 1971, 1972, 1973, 1974, 1976, 1978, 1980, 1982, 1984, 1985, 
           1986, 1987, 1988, 1990, 1991, 1992, 1993, 1994, 1995)
)

我想对df进行子集化，以便仅保留连续至少五年的条目。在此示例中，这是两个时期（1984：1988和1990：1995）的情况。

I'd like to subset the df in order to keep only those entries which have at least five consecutive years. In this example this is the case in two periods (1984:1988 and 1990:1995). How can I do this in R?

推荐答案

使用 diff 和 cumsum ：


setDT(df)[, grp := cumsum(c(0, diff(year)) > 1), by = ID
          ][, if (.N > 4) .SD, by = .(ID, grp)][, grp := NULL][]

这将提供所需的结果：
    ID year
 1:  A 1984
 2:  A 1985
 3:  A 1986
 4:  A 1987
 5:  A 1988
 6:  A 1990
 7:  A 1991
 8:  A 1992
 9:  A 1993
10:  A 1994
11:  A 1995

说明：
 
 使用 grp：= cumsum（c（0，diff（year））> 1），通过= ID 创建一个（临时）分组每个 ID 连续变量。
 
 使用 if（.N> 4）.SD， by =。（ID，grp），对于每个 ID ，您只能选择连续5年或更长时间的组。
 
 使用 grp ：= NULL 删除（临时）分组变量。
 
 

With grp := cumsum(c(0, diff(year)) > 1), by = ID you create a (temporary) grouping variable for consecutive years for each ID.
With if (.N > 4) .SD, by = .(ID, grp) you select only groups with 5 or more consecutive years for each ID.
With grp := NULL you remove the (temporary) grouping variable.

以R为基的可比较方法：
A compareble approach in base R:
i <- with(df, ave(year, ID, FUN = function(x) { 
  r <- rle(cumsum(c(0, diff(year)) > 1));
  rep(r$lengths, r$lengths)
  } ))

df[i > 4,] # or df[which(i > 4),]

结果相同。

                        这篇关于data.frame列中至少连续五年的子集的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

data.frame列中至少连续五年的子集 [英] Subset where there are at least five consecutive years in a data.frame column

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

data.frame列中至少连续五年的子集 [英] Subset where there are at least five consecutive years in a data.frame column

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭