如何为R中由给定类型的行拆分的连续行分配ID? [英] How to assign IDs for consecutive rows in R split by a given kind of row?
问题描述
我有一个数据框,其行代表人。对于给定的家庭,第一行在 A
列中的值为 1
,随后的所有行均包含相同的族,直到 A
列中的另一行具有值 1
。然后,开始一个新的家庭。
I have a dataframe whose rows represent people. For a given family, the first row has the value 1
in the column A
, and all following rows contain members of the same family until another row in in column A
has the value 1
. Then, a new family starts.
我想为数据集中的所有家庭分配ID。换句话说,我想参加:
I would like to assign IDs to all families in my dataset. In other words, I would like to take:
A
1
2
3
1
3
3
1
4
并将其变成:
A family_id
1 1
2 1
3 1
1 2
3 2
3 2
1 3
4 3
我正在使用一个300万行的数据框,因此我想出的一个简单的 for
循环解决方案不足必要的效率。此外, family_id
不必是连续的。
I'm playing with a dataframe of 3 million rows, so a simple for
-loop solution I came up with falls short of necessary efficiency. Also, the family_id
need not be sequential.
我将采用dplyr解决方案。
I'll take a dplyr solution.
推荐答案
数据:
df <- data.frame(A = c(1:3,1,3,3,1,4))
代码:
df$familiy_id <- cumsum(c(-1,diff(df$A)) < 0)
结果:
# A familiy_id
#1 1 1
#2 2 1
#3 3 1
#4 1 2
#5 3 2
#6 3 2
#7 1 3
#8 4 3
请注意:
please note:
当出现的数字小于前一个数字时,此解决方案将启动一个新组。
This solution starts a new group when a number occurs that is smaller than the previous one.
如果100%确保一个新组始终以 1
开头,那么ronak的解决方案就是完美的。
When its 100% sure that a new group always begins with a 1
consistently, then ronak's solution is perfect.
这篇关于如何为R中由给定类型的行拆分的连续行分配ID?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!