根据列中的序列中断对数据帧进行分组？ [英] Group a dataframe based on sequence breaks in a column?

查看：77 发布时间：2020/10/26 4:33:36 r dplyr

本文介绍了根据列中的序列中断对数据帧进行分组？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个data.frame，它有一列整数值。我需要形成一个分组变量，以标识该列中的序列中断。例如，我可以创建另一列升序整数，每当原始列的值不大于其滞后值时就添加一个。我该怎么做？

I have a data.frame, which has a column of integer values. I need to form a grouping variable that identifies sequence breaks in that column. For instance, I could create another column of ascending integers that adds one whenever the original column's value is not greater than its lagged value. How do I do this?

例如如果我有这样的data.frame：

E.g. if I have a data.frame like this:

df <- data.frame(A = c(1,2,4,6,78,3,56,78,23))

我需要一些生产方法带有B列的新表：

I need some way to produce new table with column B:

df$B <- c(1,1,1,1,1,2,2,2,3)

我尝试过例如与 dplyr ：

df %>% mutate(B = 1,
              B = case_when(A < lag(A), B + 1))

That is not quite correct.

推荐答案

我们可以使用 cumsum 和 diff 会在每次序列中断时增加值

We can use cumsum and diff which will increment the value every time the sequence is broken

cumsum(c(-1, diff(df$A)) < 0)
#[1] 1 1 1 1 1 2 2 2 3

我们还可以集成到 dplyr 链中以获得

library(dplyr)

df %>%
  mutate(B = cumsum(c(-1, diff(A)) < 0))

#   A B
#1  1 1
#2  2 1
#3  4 1
#4  6 1
#5 78 1
#6  3 2
#7 56 2
#8 78 2
#9 23 3

使用 lag 可能是

df %>%
  mutate(B = cumsum(c(-1, (A - lag(A))[-1]) < 0))

这篇关于根据列中的序列中断对数据帧进行分组？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

根据列中的序列中断对数据帧进行分组？ [英] Group a dataframe based on sequence breaks in a column?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

根据列中的序列中断对数据帧进行分组？ [英] Group a dataframe based on sequence breaks in a column?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭