根据列的非定向组合选择行 [英] Select rows based on non-directed combinations of columns

查看:81
本文介绍了根据列的非定向组合选择行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图根据前两列中值的组合来选择数据框第三列中的最大值.

I am trying to select the maximum value in a dataframe's third column based on the combinations of the values in the first two columns.

我的问题类似于这一个但我找不到实现我所需要的方法.

My problem is similar to this one but I can't find a way to implement what I need.

示例数据已更改,以使列名更加明显.

Sample data changed to make the column names more obvious.

以下是一些示例数据:

library(tidyr)
set.seed(1234)
df <- data.frame(group1 = letters[1:4], group2 = letters[1:4])
df <- df %>% expand(group1, group2)
df <- subset(df, subset = group1!=group2)
df$score <- runif(n = 12,min = 0,max = 1)
df

    # A tibble: 12 × 3
   group1 group2       score
   <fctr> <fctr>       <dbl>
1       a      b 0.113703411
2       a      c 0.622299405
3       a      d 0.609274733
4       b      a 0.623379442
5       b      c 0.860915384
6       b      d 0.640310605
7       c      a 0.009495756
8       c      b 0.232550506
9       c      d 0.666083758
10      d      a 0.514251141
11      d      b 0.693591292
12      d      c 0.544974836

在此示例中,第1行和第4行是重复项".我想选择第4行,因为得分列中的值大于第1行中的值.最终,我希望返回一个数据帧,其中包含group1和group2列以及得分列中的最大值.因此,在此示例中,我希望返回6行.

In this example rows 1 and 4 are 'duplicates'. I would like to select row 4 as the value in the score column is larger than in row 1. Ultimately I would like a dataframe to be returned with the group1 and group2 columns and the maximum value in the score column. So in this example, I expect there to be 6 rows returned.

如何在R中做到这一点?

How can I do this in R?

推荐答案

我希望分两步处理此问题:

I'd prefer dealing with this problem in two steps:

library(dplyr)

# Create function for computing group IDs from data frame of groups (per column)
get_group_id <- function(groups) {
  apply(groups, 1, function(row) {
    paste0(sort(row), collapse = "_")
  })
}
group_id <- get_group_id(select(df, -score))

# Perform the computation
df %>%
  mutate(groupId = group_id) %>%
  group_by(groupId) %>%
  slice(which.max(score)) %>%
  ungroup() %>%
  select(-groupId)

这篇关于根据列的非定向组合选择行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆