从具有更改名称的列计算行列最大值 [英] Calculate rowwise maximum from columns that have changing names

查看:25
本文介绍了从具有更改名称的列计算行列最大值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下对象:

s1 = "1_1_1_1_1"
s2 = "2_1_1_1_1"
s3 = "3_1_1_1_1"

请注意,s1、s2、s3的值可以在另一个例子中改变.

Please note that the value of s1, s2, s3 can change in another example.

然后我有以下数据框:

set.seed(666)
df = data.frame(draw = c(1,2,3,4,1,2,3,4,1,2,3,4),
                resp = c(1,1,1,1,2,2,2,2,3,3,3,3),
                "1_1_1_1_1" = runif(12),
                "2_1_1_1_1" = runif(12),
                "3_1_1_1_1" = runif(12)).

请注意,可能数据框的列名会根据 s1、s2、s3 的值而变化.

Please note that the column names of may data frame will change based on the values of s1,s2,s3.

我现在想实现以下目标:

I now want to achieve the following:

  1. 我想找出 df 中最后三列中哪一列的值最高,并将其作为值存储在新列中(值应该是 1,2 或 3,取决于最大值是这些变量中的第一个、第二个还是第三个).
  2. 现在我知道哪个值是最高的每行,我想按 resp 列对结果进行分组/汇总,并计算我的最大值为 1 的频率、2 或 3.
  1. I want to find out which of last three columns in df has the highest value and store it as a value in a new column (values are supposed to be either of 1,2 or 3, depending on if the highest value is the first, second or third of these variables).
  2. Now that I know which value is the highest per row, I want to group/summarize the result by the column resp and count how often my max value is 1, 2 or 3.

所以 1. 的结果应该是:

So the outcome from 1. should be:

draw    resp    1_1_1_1_1    2_1_1_1_1    3_1_1_1_1    max
1       1       0.774        0.095        0.806        3
2       1       0.197        0.142        0.266        3
...

2. 的结果应该是:

resp    first_max    second_max    third_max
1       1            1             2
2       2            1             1
3       1            2             1

我的问题是 tidyverse 的 rowwise 函数已被弃用,而且我不知道如何通过重新存储在外部的列名(在 s1、s2、s3 中)动态寻址 tidyverse 管道中的列.最后一点:我可能试图通过列名称使事情变得过于复杂,而实际上,我感兴趣的列的位置总是在列位置 3:5.

My problem is that tidyverse's rowwise function is deprecated and that I don't know how I can dynamically address columns in a tidyverse pipe by column names which a re stored externally (here in s1, s2, s3). One last note: I might be overcomplicating things by trying to go by the column names, when, in fact, the positions of the columns that I'm interested in are always at column position 3:5.

推荐答案

这是获得所需内容的一种方法.对于稍微不同的格式,您可以使用 count 而不是 table 但这与您的预期输出匹配.希望这有帮助!!

Here is one way to get what you want. For a sligthly different format, you can use count rather than table but this matches your expected output. Hope this helps!!

library(dplyr)

df %>%
  mutate(max_val = max.col(select(., starts_with("X")))) %>%
  select(resp, max_val) %>%
  table()

    max_val
resp 1 2 3
   1 1 1 2
   2 2 1 1
   3 1 2 1

或者,您可以这样做:

df %>%
  mutate(max_val = max.col(.[3:5])) %>%
  count(resp, max_val) %>%
  mutate(max_val = paste0("max_", max_val)) %>%
  spread(value = n, key = max_val)

   resp max_1 max_2 max_3
  <dbl> <int> <int> <int>
1     1     1     1     2
2     2     2     1     1
3     3     1     2     1

这篇关于从具有更改名称的列计算行列最大值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆