NA值存在下的加权平均值 [英] Weighted average value in the presence of NA values

查看:112
本文介绍了NA值存在下的加权平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我正在处理的一个非常简单的示例:

Here's a very simple example of what I'm dealing with:

data_stack <- data.table(CompA_value = c(10,20,30,40), CompB_value = c(60,70,80,80), CompC_value = c(NA, NA, NA, 100), CompA_weight = c(0.2, 0.3,0.4,0.4), CompB_weight = c(0.8,0.7,0.6,0.4), CompC_weight = c(NA, NA, NA,0.2))

   CompA_value CompB_value CompC_value CompA_weight CompB_weight CompC_weight
1:          10          60          NA          0.2          0.8           NA
2:          20          70          NA          0.3          0.7           NA
3:          30          80          NA          0.4          0.6           NA
4:          40          80         100          0.4          0.4          0.2

我要为每行计算CompA到C的加权平均值.但是,请注意CompC具有1-3行的NA.我希望第1-3行具有CompA和CompB的加权平均值,但是一旦CompC启用,我希望将其自动包含在计算中.

What I want to do is calculate the weighted average of CompA through C, for each row. However, notice that CompC has NAs for rows 1-3. What I would like is for rows 1-3 to have the weighted average of CompA and CompB, but once CompC becomes active, I'd like to have it automatically included in the calculation.

就目前而言,我已经做了类似的事情:

As it stands, I've done something like this:

> data_stack[, Weighted_average := CompA_value*CompA_weight + CompB_value*CompB_weight + CompC_value * CompC_weight]
> data_stack
   CompA_value CompB_value CompC_value CompA_weight CompB_weight CompC_weight Weighted_average
1:          10          60          NA          0.2          0.8           NA               NA
2:          20          70          NA          0.3          0.7           NA               NA
3:          30          80          NA          0.4          0.6           NA               NA
4:          40          80         100          0.4          0.4          0.2               68

但是我的"Weighted_average"列显然不会给我前1-3行的权重.

But my "Weighted_average" column obviously won't give me weights for the first 1-3 rows.

我想要的是:

 data_stack[, Weighted_average := c((10*0.2 + 60*0.8),(20*0.3 + 70*0.7),(30*0.4 + 80*0.6),(40*0.4 + 80*0.4 + 100*0.2))]
 data_stack
   CompA_value CompB_value CompC_value CompA_weight   CompB_weight CompC_weight   Weighted_average
1:          10          60          NA          0.2          0.8           NA               50
2:          20          70          NA          0.3          0.7           NA               55
3:          30          80          NA          0.4          0.6           NA               60
4:          40          80         100          0.4          0.4          0.2               68

因此,请注意前三行如何只是A和B的加权平均值,但是一旦C可用,它也将包含在计算中.

So, note how the first three rows are just the weighted average of A and B, but once C becomes available, it is also included into the calculation.

因此,我想了解如何编写一些代码来检测是否存在NA值,如果有,则跳过它,但如果不包括在计算中.

So I'd like to find out how to write some code which picks up whether there is an NA value, if so, skips it, but if not includes it in the calculation.

我有一个相当大的数据表,所以手动执行是不可能的!

I've got a considerably bigger data table so doing it manually is out of the question!

致谢.

推荐答案

在这里:

data_stack$Weighted_average = apply(data_stack,1,function(x){
  y = c(x["CompA_value"]*x["CompA_weight"],
        x["CompB_value"]*x["CompB_weight"],
        x["CompC_value"]*x["CompC_weight"])
  return(sum(y,na.rm = T))
})

结果:

> data_stack
  CompA_value CompB_value CompC_value CompA_weight CompB_weight CompC_weight Weighted_average
1          10          60          NA          0.2          0.8           NA               50
2          20          70          NA          0.3          0.7           NA               55
3          30          80          NA          0.4          0.6           NA               60
4          40          80         100          0.4          0.4          0.2               68

该函数为每列创建一个带有value * weight的向量.然后返回忽略NA值的总和.这意味着它将忽略任何列中的NA值.

The function creates a vector with value*weight for each column. Then returns the sum ignoring the NA values. This means that this will ignore NA values in any column.

这篇关于NA值存在下的加权平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆