根据两列分配唯一的ID [英] Assign unique ID based on two columns

查看:75
本文介绍了根据两列分配唯一的ID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的数据框(df):

I have a dataframe (df) that looks like this:

School Student  Year  
A         10    1999
A         10    2000
A         20    1999
A         20    2000
A         20    2001
B         10    1999
B         10    2000

我想创建一个人ID列,以使df看起来像这样:

And I would like to create a person ID column so that df looks like this:

ID School Student  Year  
1   A         10    1999
1   A         10    2000
2   A         20    1999
2   A         20    2000
2   A         20    2001
3   B         10    1999
3   B         10    2000

换句话说,ID变量指示它是数据集中的哪个人,同时考虑了学生人数和学校会员资格(这里共有3个学生).

In other words, the ID variable indicates which person it is in the dataset, accounting for both Student number and School membership (here we have 3 students total).

我做了df$ID <- df$Student并尝试请求值+1,如果c("School", "Student)是唯一的.它不起作用.帮助表示赞赏.

I did df$ID <- df$Student and tried to request the value +1 if c("School", "Student) was unique. It isn't working. Help appreciated.

推荐答案

我们可以在base R中执行此操作,而无需按操作进行任何分组

We can do this in base R without doing any group by operation

df$ID <- cumsum(!duplicated(df[1:2]))
df
#   School Student Year ID
#1      A      10 1999  1
#2      A      10 2000  1
#3      A      20 1999  2
#4      A      20 2000  2
#5      A      20 2001  2
#6      B      10 1999  3
#7      B      10 2000  3

注意:假设学校"和学生"是有序的

NOTE: Assuming that 'School' and 'Student' are ordered

或使用tidyverse

library(dplyr)
df %>% 
    mutate(ID = group_indices_(df, .dots=c("School", "Student"))) 
#  School Student Year ID
#1      A      10 1999  1
#2      A      10 2000  1
#3      A      20 1999  2
#4      A      20 2000  2
#5      A      20 2001  2
#6      B      10 1999  3
#7      B      10 2000  3

如@radek所述,在最新版本(dplyr_0.8.0)中,我们收到不推荐使用group_indices_的通知,而是使用group_indices

As @radek mentioned, in the recent version (dplyr_0.8.0), we get the notification that group_indices_ is deprecated, instead use group_indices

df %>% 
   mutate(ID = group_indices(., School, Student))

这篇关于根据两列分配唯一的ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆