dplyr会覆盖组中除值的首次出现以外的所有值 [英] dplyr override all but the first occurrences of a value within a group

查看:92
本文介绍了dplyr会覆盖组中除值的首次出现以外的所有值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个分组的data_frame,带有 tag列,取值 0和 1。在每个组中,我需要找到第一个出现的 1并将所有剩余出现的值更改为 0。有没有一种方法可以在dplyr中实现?

I have a grouped data_frame with a "tag" column taking on values "0" and "1". In each group, I need to find the first occurrence of "1" and change all the remaining occurrences to "0". Is there a way to achieve it in dplyr?

例如,让我们获取 iris数据,然后添加额外的 tag列:

For example, let's take "iris" data and let's add the extra "tag" column:

data(iris)
set.seed(1)
iris$tag <- sample( c(0, 1), 150, replace = TRUE, prob = c(0.8, 0.2))
giris <- iris %>% group_by(Species)

在 giris中,在 setosa组中,我只保留第一个出现的 1(即第四行),并将其余的设置为 0 。似乎有点像戴口罩之类的东西。

In "giris", in the "setosa" group I need to keep only the first occurrence of "1" (i.e. in 4th row) and set the remaining ones to "0". This seems a bit like applying a mask or something...

有没有办法做到这一点?我一直在尝试与和重复,但没有成功。我一直在考虑只过滤 1,然后保留它们,然后再加入其余的数据集,但这似乎很尴尬,尤其是对于12GB的数据集。

Is there a way to do it? I have been experimenting with "which" and "duplicated" but I did not succeed. I have been thinking about filtering the "1"s only, keeping them, then joining with the remaining set, but this seems awkward, especially for a 12GB data set.

推荐答案

一个dplyr选项:

mutate(giris, newcol = as.integer(tag & cumsum(tag) == 1))

mutate(giris, newcol = as.integer(tag & !duplicated(tag)))

或者使用data.table,相同的方法,但通过引用进行修改:

Or using data.table, same approach, but modify by reference:

library(data.table)
setDT(giris)
giris[, newcol := as.integer(tag & cumsum(tag) == 1), by = Species]

这篇关于dplyr会覆盖组中除值的首次出现以外的所有值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆