重复时如何根据优先级重新分类/替换值 [英] How to reclassify/replace values based on priority when there are repeats

查看:46
本文介绍了重复时如何根据优先级重新分类/替换值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 df,其中 value 表示 drug 的状态:

I have a df where value indicates the status of a drug:

g1 = data.frame ( 
    drug = c('a','a','a','d','d'),
    value = c('fda','trial','case','case','pre')
)

drug value
1    a   fda
2    a trial
3    a  case
4    d  case
5    d   pre

因此,对于药物,我想根据 value 的以下优先级顺序替换任何重复的 drug:

So for drugs, I want to replace any repeat drug based on the following order-of-priority for value:

fda > trial > case > pre 

例如,如果药物 d 是case"和pre",所有发生 d 的事件都将被重新分类为case".决赛桌应该是这样的.

So for example if drug d is "case" as well as "pre", all incidence of d will be reclassify as "case". The final table should look like this.

  drug value
1    a   fda
2    a   fda
3    a   fda
4    d  case
5    d  case

如何做到这一点而不必遍历每种药物并先确定优先级然后替换?

How to do this without having to loop through each drug and figuring out the precedence first then replacing?

推荐答案

由于这是一个序数变量,你可以将 g1$value 设为一个 ordered 因子作为对应的<代码>类.然后你可以像使用数字一样使用 minmax 之类的函数:

Since this is an ordinal variable, you can make g1$value an ordered factor as the corresponding class. Then you can use functions like min and max like you would a numeric:

g1$value <- ordered(g1$value, levels = c("fda", "trial", "case", "pre"))
g1$value
#[1] fda   trial case  case  pre  
#Levels: fda < trial < case < pre
g1$value <- ave(g1$value, g1$drug, FUN=min)
g1
#  drug value
#1    a   fda
#2    a   fda
#3    a   fda
#4    d  case
#5    d  case

或者用dplyr说:

g1 %>%
  mutate(value = ordered(value, levels = c("fda", "trial", "case", "pre"))) %>%
  group_by(drug) %>%
  mutate(value = min(value))

数据集中的顺序和任何 drug 组中存在的值范围不应影响此结果:

The order in the dataset and the range of values present in any drug group shouldn't affect this result:

g2 = data.frame ( 
    drug = c( "a","a","a","d","d","e","e","e"),
    value = c("fda","trial","case","case","pre","pre","fda","case")
)

#  drug value
#1    a   fda
#2    a trial
#3    a  case
#4    d  case
#5    d   pre
#6    e   pre
#7    e   fda
#8    e  case

g2 %>%
  mutate(value = ordered(value, levels = c("fda", "trial", "case", "pre"))) %>%
  group_by(drug) %>%
  mutate(value = min(value))

## A tibble: 8 x 2
## Groups:   drug [3]
#  drug  value
#  <fct> <ord>
#1 a     fda  
#2 a     fda  
#3 a     fda  
#4 d     case 
#5 d     case 
#6 e     fda  
#7 e     fda  
#8 e     fda 

这篇关于重复时如何根据优先级重新分类/替换值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆