根据向量提供的变量名称和权重创建均值变量 [英] Creating a mean variable, from variable names and weights supplied by vectors

查看:54
本文介绍了根据向量提供的变量名称和权重创建均值变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我想基于两个向量在给定的数据框中创建一个均值变量,一个指定要使用的变量的名称,另一个指定这些变量应进入均值变量的权重:

Suppose I want to create a mean variable in a given dataframe based on two vectors, one specifying the names of the variables to use, and one specifying weights by which these variables should go into the mean variable:

vars <- c("a", "b", "c","d"))
weights <- c(0.5, 0.7, 0.8, 0.2))
df <- data.frame(cbind(c(1,4,5,7), c(2,3,7,5), c(1,1,2,3), 
                       c(4,5,3,3), c(3,2,2,1), c(5,5,7,1)))
colnames(df) <- c("a","b","c","d","e","f")

我如何使用 dplyr::mutate() 创建一个均值变量,该变量使用 varsweights 来计算行得分?mutate() 应该专门使用 vars 提供的变量结果应该基本上做到以下几点:

How could I use dplyr::mutate() to create a mean variable that uses vars and weights to calculate a rowwise score? mutate() should specifically use the variables supplied by vars The result should basically do the following:

df <- df %>% 
  rowwise() %>% 
  mutate(comp = mean(c(vars[1]*weights[1], vars[2]*weights[2], ...)))

写出来:

df2 <- df %>% 
  rowwise() %>% 
  mutate(comp = mean(c(0.5*a, 0.7*b, 0.8*c, 0.2*d)))

我不知道如何做到这一点,因为虽然 vars 包含我想在 df 中用于 mutate 的确切变量名称,但在 vars 它们是字符串.我怎样才能让 mutate() 理解 vars 包含的字符串与我的 df 中的列有关?如果您知道另一个不使用 mutate() 的过程,那也没关系.谢谢!

I can't figure out how to do this because, although vars contains the exact variable names that I want to use for mutate in my df, inside vars they are strings. How could I make mutate() understand that the strings vars contains relate to columns in my df? If you know another procedure not using mutate() that's fine also. Thanks!

推荐答案

您可以使用

df %>% mutate(wmean = apply(.[vars], 1, weighted.mean, weights))
#   a b c d e f     mean
# 1 1 2 1 4 3 5 1.590909
# 2 4 3 1 5 2 5 2.681818
# 3 5 7 2 3 2 7 4.363636
# 4 7 5 3 3 1 1 4.545455

但是使用 tidyverse 并没有什么好处,因为基本的 R 方法几乎相同,但最终会更短:

but there is not much to gain with tidyverse as base R approaches can be almost the same and end up being shorter:

df$wmean <- apply(df[vars], 1, weighted.mean, weights)

或以下之一:

df$wmean <- colSums(t(df[vars]) * weights) / sum(weights)
df$wmean <- as.matrix(df[vars]) %*% weights / sum(weights)
df$wmean <- rowSums(sweep(df[vars], 2, weights, `*`)) / sum(weights)

这篇关于根据向量提供的变量名称和权重创建均值变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆