根据向量提供的变量名称和权重创建均值变量 [英] Creating a mean variable, from variable names and weights supplied by vectors
问题描述
假设我想基于两个向量在给定的数据框中创建一个均值变量,一个指定要使用的变量的名称,另一个指定这些变量应进入均值变量的权重:
Suppose I want to create a mean variable in a given dataframe based on two vectors, one specifying the names of the variables to use, and one specifying weights by which these variables should go into the mean variable:
vars <- c("a", "b", "c","d"))
weights <- c(0.5, 0.7, 0.8, 0.2))
df <- data.frame(cbind(c(1,4,5,7), c(2,3,7,5), c(1,1,2,3),
c(4,5,3,3), c(3,2,2,1), c(5,5,7,1)))
colnames(df) <- c("a","b","c","d","e","f")
我如何使用 dplyr::mutate()
创建一个均值变量,该变量使用 vars
和 weights
来计算行得分?mutate()
应该专门使用 vars
提供的变量结果应该基本上做到以下几点:
How could I use dplyr::mutate()
to create a mean variable that uses vars
and weights
to calculate a rowwise score? mutate()
should specifically use the variables supplied by vars
The result should basically do the following:
df <- df %>%
rowwise() %>%
mutate(comp = mean(c(vars[1]*weights[1], vars[2]*weights[2], ...)))
写出来:
df2 <- df %>%
rowwise() %>%
mutate(comp = mean(c(0.5*a, 0.7*b, 0.8*c, 0.2*d)))
我不知道如何做到这一点,因为虽然 vars
包含我想在 df
中用于 mutate 的确切变量名称,但在 vars
它们是字符串.我怎样才能让 mutate()
理解 vars
包含的字符串与我的 df
中的列有关?如果您知道另一个不使用 mutate()
的过程,那也没关系.谢谢!
I can't figure out how to do this because, although vars
contains the exact variable names that I want to use for mutate in my df
, inside vars
they are strings. How could I make mutate()
understand that the strings vars
contains relate to columns in my df
? If you know another procedure not using mutate()
that's fine also. Thanks!
推荐答案
您可以使用
df %>% mutate(wmean = apply(.[vars], 1, weighted.mean, weights))
# a b c d e f mean
# 1 1 2 1 4 3 5 1.590909
# 2 4 3 1 5 2 5 2.681818
# 3 5 7 2 3 2 7 4.363636
# 4 7 5 3 3 1 1 4.545455
但是使用 tidyverse
并没有什么好处,因为基本的 R 方法几乎相同,但最终会更短:
but there is not much to gain with tidyverse
as base R approaches can be almost the same and end up being shorter:
df$wmean <- apply(df[vars], 1, weighted.mean, weights)
或以下之一:
df$wmean <- colSums(t(df[vars]) * weights) / sum(weights)
df$wmean <- as.matrix(df[vars]) %*% weights / sum(weights)
df$wmean <- rowSums(sweep(df[vars], 2, weights, `*`)) / sum(weights)
这篇关于根据向量提供的变量名称和权重创建均值变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!