从具有更改名称的列计算行列最大值 [英] Calculate rowwise maximum from columns that have changing names
问题描述
我有以下对象:
s1 = "1_1_1_1_1"
s2 = "2_1_1_1_1"
s3 = "3_1_1_1_1"
请注意,s1、s2、s3的值可以在另一个例子中改变.
Please note that the value of s1, s2, s3 can change in another example.
然后我有以下数据框:
set.seed(666)
df = data.frame(draw = c(1,2,3,4,1,2,3,4,1,2,3,4),
resp = c(1,1,1,1,2,2,2,2,3,3,3,3),
"1_1_1_1_1" = runif(12),
"2_1_1_1_1" = runif(12),
"3_1_1_1_1" = runif(12)).
请注意,可能数据框的列名会根据 s1、s2、s3 的值而变化.
Please note that the column names of may data frame will change based on the values of s1,s2,s3.
我现在想实现以下目标:
I now want to achieve the following:
- 我想找出
df
中最后三列中哪一列的值最高,并将其作为值存储在新列中(值应该是 1,2 或 3,取决于最大值是这些变量中的第一个、第二个还是第三个). - 现在我知道哪个值是最高的每行,我想按
resp
列对结果进行分组/汇总,并计算我的最大值为 1 的频率、2 或 3.
- I want to find out which of last three columns in
df
has the highest value and store it as a value in a new column (values are supposed to be either of 1,2 or 3, depending on if the highest value is the first, second or third of these variables). - Now that I know which value is the highest per row, I want to group/summarize the result by the column
resp
and count how often my max value is 1, 2 or 3.
所以 1. 的结果应该是:
So the outcome from 1. should be:
draw resp 1_1_1_1_1 2_1_1_1_1 3_1_1_1_1 max
1 1 0.774 0.095 0.806 3
2 1 0.197 0.142 0.266 3
...
2. 的结果应该是:
resp first_max second_max third_max
1 1 1 2
2 2 1 1
3 1 2 1
我的问题是 tidyverse 的 rowwise 函数已被弃用,而且我不知道如何通过重新存储在外部的列名(在 s1、s2、s3 中)动态寻址 tidyverse 管道中的列.最后一点:我可能试图通过列名称使事情变得过于复杂,而实际上,我感兴趣的列的位置总是在列位置 3:5.
My problem is that tidyverse's rowwise function is deprecated and that I don't know how I can dynamically address columns in a tidyverse pipe by column names which a re stored externally (here in s1, s2, s3). One last note: I might be overcomplicating things by trying to go by the column names, when, in fact, the positions of the columns that I'm interested in are always at column position 3:5.
推荐答案
这是获得所需内容的一种方法.对于稍微不同的格式,您可以使用 count
而不是 table
但这与您的预期输出匹配.希望这有帮助!!
Here is one way to get what you want. For a sligthly different format, you can use count
rather than table
but this matches your expected output. Hope this helps!!
library(dplyr)
df %>%
mutate(max_val = max.col(select(., starts_with("X")))) %>%
select(resp, max_val) %>%
table()
max_val
resp 1 2 3
1 1 1 2
2 2 1 1
3 1 2 1
或者,您可以这样做:
df %>%
mutate(max_val = max.col(.[3:5])) %>%
count(resp, max_val) %>%
mutate(max_val = paste0("max_", max_val)) %>%
spread(value = n, key = max_val)
resp max_1 max_2 max_3
<dbl> <int> <int> <int>
1 1 1 1 2
2 2 2 1 1
3 3 1 2 1
这篇关于从具有更改名称的列计算行列最大值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!