R-按变量分组,然后分配唯一的ID [英] R - Group by variable and then assign a unique ID
本文介绍了R-按变量分组,然后分配唯一的ID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有兴趣用时间固定值和时变值来取消识别敏感数据集。我想(a)将所有案件按社会保险号分组,(b)为这些案件分配唯一的ID,然后(c)删除社会保险号。
I am interested in de-identifying a sensitive data set with both time-fixed and time-variant values. I want to (a) group all cases by social security number, (b) assign those cases a unique ID and then (c) remove the social security number.
数据集示例:
personal_id gender temperature
111-11-1111 M 99.6
999-999-999 F 98.2
111-11-1111 M 97.8
999-999-999 F 98.3
888-88-8888 F 99.0
111-11-1111 M 98.9
任何解决方案将不胜感激。
Any solutions would be very much appreciated.
推荐答案
dplyr
具有用于创建唯一组ID的 group_indices
函数
dplyr
has a group_indices
function for creating unique group IDs
library(dplyr)
data <- data.frame(personal_id = c("111-111-111", "999-999-999", "222-222-222", "111-111-111"),
gender = c("M", "F", "M", "M"),
temperature = c(99.6, 98.2, 97.8, 95.5))
data$group_id <- data %>% group_indices(personal_id)
data <- data %>% select(-personal_id)
data
gender temperature group_id
1 M 99.6 1
2 F 98.2 3
3 M 97.8 2
4 M 95.5 1
或在同一管道中( https://github.com/tidyverse/dplyr/issues/2160):
data %>%
mutate(group_id = group_indices(., personal_id))
这篇关于R-按变量分组,然后分配唯一的ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文