使用dplyr按组划分序列中的第一行和最后一行 [英] Use dplyr to take first and last row in a sequence by group

查看:70
本文介绍了使用dplyr按组划分序列中的第一行和最后一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 dplyr 按组获取重复值的第一行和最后一行.我这样做是出于效率方面的考虑,尤其是为了加快绘图速度.

I'm trying to use dplyr to take the first and last rows of repeated values by group. I'm doing this for efficiency reasons, particularly so that graphing is faster.

这不是从分组中选择第一行和最后一行的重复项数据,因为我并不是要在组中严格要求第一行和最后一行;我要按组按级别(在我的情况下为1和0)中出现的第一行和最后一行,这些行可能出现在多个块中.

This is not a duplicate of Select first and last row from grouped data because I'm not asking for the strict first and last row in a group; I'm asking for the first and last row in a group by level (in my case 1's and 0's) that may appear in multiple chunks.

这是一个例子.假设我要从列C中删除所有冗余的1和0,同时保持A和B不变.

Here's an example. Say I want to remove all the redundant 1's and 0's from column C while keeping A and B intact.

df = data.frame(
    A = rep(c("a", "b"), each = 10),
    B = rep(c(1:10), 2),
    C = c(1,0,0,0,0,0,1,1,1,1,0,0,0,1,0,0,0,0,0,1))

A  B C
a  1 1
a  2 0
a  3 0
a  4 0
a  5 0
a  6 0
a  7 1
a  8 1
a  9 1
a 10 1
b  1 0
b  2 0
b  3 0
b  4 1
b  5 0
b  6 0
b  7 0
b  8 0
b  9 0
b 10 1

最终结果应如下所示:

A  B C
a  1 1
a  2 0
a  6 0
a  7 1
a 10 1
b  1 0
b  3 0
b  4 1
b  5 0
b  9 0
b 10 1

使用 unique 不会删除任何内容,或者只取1或0之一,而不会保留我要达到的开始和结束质量.有没有办法不使用循环来执行此操作,也许使用 dplyr forcats ?

Using unique will either not remove anything or just take one of the 1's or 0's without retaining the start-and-end quality that I'm trying to achieve. Is there a way to do this without a loop, perhaps using dplyr or forcats?

推荐答案

我认为 slice 应该会让您接近:

I think that slice should get you close:

df %>%
  group_by(A,C) %>%
  slice(c(1, n()))

给予

      A     B     C
  <chr> <int> <dbl>
1     a     2     0
2     a     6     0
3     a     1     1
4     a    10     1
5     b     1     0
6     b     9     0
7     b     4     1
8     b    10     1

尽管这与您的预期结果不太匹配.n()给出组中的最后一行.

though this doesn't quite match your expected outcome. n() gives the last row in the group.

编辑后,很明显,您不在任何已建立的组中寻找值(这是我先前的版本所做的事情).您想按1或0的运行进行分组.为此,您将需要创建一列以检查1/0的运行是否已更改,然后再检查一列以标识组.然后, slice 将按照前面所述的方式工作.但是,由于您的某些行程只有1行长,因此如果<1>行超过1,则只需包含 n()(否则1行显示两次).

After your edit it is clear that you are not looking for the values within any group that is established (which is what my previous version did). You want to group by those runs of 1's or 0's. For that, you will need to create a column that checks whether or not the run of 1's/0's has changed and then one to identify the groups. Then, slice will work as described before. However, because some of your runs are only 1 row long, we need to only include n() if it is more than 1 (otherwise the 1 row shows up twice).

df %>%
  mutate(groupChanged = (C != lag(C, default = C[1]))
         , toCutBy = cumsum(groupChanged)
         ) %>%
  group_by(toCutBy) %>%
  slice(c(1, ifelse(n() == 1, NA, n())))

给予

       A     B     C groupChanged toCutBy
   <chr> <int> <dbl>        <lgl>   <int>
1      a     1     1        FALSE       0
2      a     2     0         TRUE       1
3      a     6     0        FALSE       1
4      a     7     1         TRUE       2
5      a    10     1        FALSE       2
6      b     1     0         TRUE       3
7      b     3     0        FALSE       3
8      b     4     1         TRUE       4
9      b     5     0         TRUE       5
10     b     9     0        FALSE       5
11     b    10     1         TRUE       6

如果1或0的游程必须保持在 A 列中的级别之内,则还需要添加一个检查以检查呼叫中 A 列中的更改.在此示例中,它没有效果(因此返回的值完全相同),但在其他情况下可能是理想的.

If the runs of 1 or 0 must stay within the level in column A, you also need to add a check for a change in column A to the call. In this example, it does not have an effect (so returns exactly the same values), but it may be desirable in other instances.

df %>%
  mutate(groupChanged = (C != lag(C, default = C[1]) |
                           A != lag(A, default = A[1]))
         , toCutBy = cumsum(groupChanged)
  ) %>%
  group_by(toCutBy) %>%
  slice(c(1, ifelse(n() == 1, NA, n())))

这篇关于使用dplyr按组划分序列中的第一行和最后一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆