从组和标签关系中创建标识最小字符的列 [英] Create column identifying minimum character from within a group and label ties

查看:119
本文介绍了从组和标签关系中创建标识最小字符的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经为10个主题配对数据(有一些缺失和一些关系)。我的目标是选择具有最佳 disc_grade (A> B> C)的 eye 数据框。

I have paired data for 10 subjects (with some missing and some ties). My goal is to select the eye with the best disc_grade (A > B > C) and label ties accordingly from the data frame below.

我遇到了如何使用R代码为每个主题选择最佳 disc_grade 的行。

I'm stuck on how to use R code to select the rows with the best disc_grade for each subject.

df <- structure(list(patientID = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 
6, 7, 7, 8, 8, 9, 9, 10, 10), eye = c("R", "L", "R", "L", "R", 
"L", "R", "L", "R", "L", "R", "L", "R", "L", "R", "L", "R", "L", 
"R", "L"), disc_grade = c(NA, "B", "C", "B", "B", "C", "B", "C", 
"B", "A", "B", "B", "C", "B", NA, NA, "B", "C", "B", "C")), .Names = c("patientID", "eye", "disc_grade"), class = c("tbl_df", "data.frame"), row.names = c(NA, -20L))

所需输出为:

   patientID   eye disc_grade
2          1   L          B
4          2   L          B
5          3   R          B
7          4   R          B
10         5   L          A
11         6   Tie        B
14         7   L          B
17         9   R          B
19        10   R          B


推荐答案

这似乎有效:

df %>% 
  group_by(patientID) %>% 
  filter(disc_grade == min(disc_grade, na.rm=TRUE)) %>%
  summarise(eye = if (n()==1) eye else "Tie", disc_grade = first(disc_grade))

  patientID   eye disc_grade
      (dbl) (chr)      (chr)
1         1     L          B
2         2     L          B
3         3     R          B
4         4     R          B
5         5     L          A
6         6   Tie          B
7         7     L          B
8         9     R          B
9        10     R          B

第8组有警告,但我们得到所需的结果如何过滤器 NA 上工作。

There is a warning for group 8, but we get the desired result thanks to how filter works on NAs.

使用data.table:

With data.table:

setDT(df)[, 
  .SD[ disc_grade == min(disc_grade, na.rm=TRUE) ][,
    .( eye = if (.N==1) eye else "Tie", disc_grade = disc_grade[1] )
  ]
, by=patientID]

同样,有一个警告,第8组的一行,因为 [不忽略 NA 。为了解决这个问题,您可以在操作之前或之后过滤NAs(如在其他答案中)。我在主操作期间做的最好的想法是相当复杂:

Again, there's a warning, but now we do get a row for group 8, since [ does not ignore NAs. To get around this, you could filter the NAs before or after the operation (as in other answers). My best idea for doing it during the main operation is pretty convoluted:

setDT(df)[, 
  .SD[ which(disc_grade == min(disc_grade, na.rm=TRUE)) ][,
    if (.N >= 1) list( eye = if (.N==1) eye else "Tie", disc_grade = disc_grade[1] )
  ]
, by=patientID]

这篇关于从组和标签关系中创建标识最小字符的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆