R如何按行值分组,拆分或子集 [英] R How to group_by, split or subset by row values

查看:746
本文介绍了R如何按行值分组,拆分或子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是上一个问题 R(如何按行分组)的继续价值?拆分?

输入数据框的更改为

id = str_c("x",1:22)
val = c(rep("NO1", 2), "START", rep("yes1", 2), "STOP", "NO",
         "START","NO1", "START", rep("yes2", 3), "STOP", "NO1",
        "START", rep("NO3",3), "STOP", "NO1", "STOP")
data = data.frame(id,val)

预期输出是具有 val 列的数据帧,如下所示-

Expected output is dataframe with val column as follows-

val = c("START", rep("yes1", 2), "STOP", 
        "START","NO1", "START", rep("yes2", 3), "STOP",
        "START", rep("NO3",3), "STOP", "NO1", "STOP")

推荐答案

简单来说,如果我们删除所有其他既不是START也不是STOP的条目,则START是一个有效的起点(如果它是第一个START或在停止之前;同样,如果STOP是最后一个STOP或START后面的STOP,则它是有效的端点.考虑一下此功能:

Simply speaking, if we remove all the other entries that are neither START nor STOP, then, a START is a valid start point if it is the first START or preceded by a STOP; similarly, a STOP is a valid endpoint if it is the last STOP or succeeded by a START. Consider this function:

valid_anchors <- function(x) {
  are_anchors <- x %in% c("START", "STOP")
  id <- seq_along(x)[are_anchors]
  x <- x[are_anchors]
  start_pos <- which(x == "START" & c("", head(x, -1L)) %in% c("", "STOP"))
  stop_pos <- which(x == "STOP" & c(tail(x, -1L), "") %in% c("", "START"))
  list(id[start_pos], id[stop_pos])
}

然后只需应用您在上一篇文章中获得的相同功能

Then just apply the same function you got in your last post

ind <- valid_anchors(data$val)

data[sort(unique(unlist(mapply(`:`, ind[[1]], ind[[2]])))), ]

输出

    id   val
3   x3 START
4   x4  yes1
5   x5  yes1
6   x6  STOP
8   x8 START
9   x9   NO1
10 x10 START
11 x11  yes2
12 x12  yes2
13 x13  yes2
14 x14  STOP
16 x16 START
17 x17   NO3
18 x18   NO3
19 x19   NO3
20 x20  STOP
21 x21   NO1
22 x22  STOP

这篇关于R如何按行值分组,拆分或子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆