根据两列分配唯一的ID [英] Assign unique ID based on two columns
问题描述
我有一个看起来像这样的数据框(df):
I have a dataframe (df) that looks like this:
School Student Year
A 10 1999
A 10 2000
A 20 1999
A 20 2000
A 20 2001
B 10 1999
B 10 2000
我想创建一个人ID
列,以使df看起来像这样:
And I would like to create a person ID
column so that df looks like this:
ID School Student Year
1 A 10 1999
1 A 10 2000
2 A 20 1999
2 A 20 2000
2 A 20 2001
3 B 10 1999
3 B 10 2000
换句话说,ID
变量指示它是数据集中的哪个人,同时考虑了学生人数和学校会员资格(这里共有3个学生).
In other words, the ID
variable indicates which person it is in the dataset, accounting for both Student number and School membership (here we have 3 students total).
我做了df$ID <- df$Student
并尝试请求值+1,如果c("School", "Student)
是唯一的.它不起作用.帮助表示赞赏.
I did df$ID <- df$Student
and tried to request the value +1 if c("School", "Student)
was unique. It isn't working. Help appreciated.
推荐答案
我们可以在base R
中执行此操作,而无需按操作进行任何分组
We can do this in base R
without doing any group by operation
df$ID <- cumsum(!duplicated(df[1:2]))
df
# School Student Year ID
#1 A 10 1999 1
#2 A 10 2000 1
#3 A 20 1999 2
#4 A 20 2000 2
#5 A 20 2001 2
#6 B 10 1999 3
#7 B 10 2000 3
注意:假设学校"和学生"是有序的
NOTE: Assuming that 'School' and 'Student' are ordered
或使用tidyverse
library(dplyr)
df %>%
mutate(ID = group_indices_(df, .dots=c("School", "Student")))
# School Student Year ID
#1 A 10 1999 1
#2 A 10 2000 1
#3 A 20 1999 2
#4 A 20 2000 2
#5 A 20 2001 2
#6 B 10 1999 3
#7 B 10 2000 3
如@radek所述,在最新版本(dplyr_0.8.0
)中,我们收到不推荐使用group_indices_
的通知,而是使用group_indices
As @radek mentioned, in the recent version (dplyr_0.8.0
), we get the notification that group_indices_
is deprecated, instead use group_indices
df %>%
mutate(ID = group_indices(., School, Student))
这篇关于根据两列分配唯一的ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!