在 R 中计算带条件的乘法列的平均值 [英] Calculate Mean of Multiply Columns with Condition in R
本文介绍了在 R 中计算带条件的乘法列的平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想计算几个变量的平均值,但有条件,如果这些列中的 2 列有 NA,则平均值将为 NA,如果小于 2,则求平均值
I want to calculate mean of several variables but with condition, if 2 of those columns have NA, mean will be NA, if less than 2, find mean
df <- data.frame(ID = c(1:10),X1 = c(rep(1,5),rep(2,5)),X2 = c(1:10),X3 = c(1,NA,2,NA,NA,1,NA,2,NA,NA),X4 = c(rep(NA,10)),X5=c(rep(1,5),rep(NA,5)),
Y1 = c(rep(1,5),rep(2,5)),Y2 = c(1:10),Y3 = c(1,NA,2,NA,NA,1,NA,2,NA,NA),Y4 = c(rep(NA,10)),Y5=c(rep(1,5),rep(NA,5)))
MeanX = round(apply(df[,c(2:6)],1, mean,na.rm = TRUE),2)
MeanY = round(apply(df[,c(7:11)],1,mean,na.rm = TRUE),2)
这是输出不正确
ID X1 X2 X3 X4 X5 Y1 Y2 Y3 Y4 Y5 MeanX MeanY
1 1 1 1 1 NA 1 1 1 1 NA 1 1.00 1.00
2 2 1 2 NA NA 1 1 2 NA NA 1 1.33 1.33*
3 3 1 3 2 NA 1 1 3 2 NA 1 1.75 1.75
4 4 1 4 NA NA 1 1 4 NA NA 1 2.00 2.00*
5 5 1 5 NA NA 1 1 5 NA NA 1 2.33 2.33*
6 6 2 6 1 NA NA 2 6 1 NA NA 3.00 3.00*
7 7 2 7 NA NA NA 2 7 NA NA NA 4.50 4.50 *
8 8 2 8 2 NA NA 2 8 2 NA NA 4.00 4.00 *
9 9 2 9 NA NA NA 2 9 NA NA NA 5.50 5.50 *
10 10 2 10 NA NA NA 2 10 NA NA NA 6.00 6.00 * This is supposed NA,bc there are 3 columns have NA
因为我有一个很大的数据集,对于每个组,有时我必须设置 20 个中的 6 个,有时 10 个中的 1 个,所以我可以计算平均值,我可以如何为这种情况设置条件.
Because I have a large dataset, for each group sometimes I have to set 6 out of 20,sometimes 1 out of 10, so I can calculate mean, how I can set condition for this case.
推荐答案
这是一个使用 data.table
的非常快速(必须运行)和肮脏的解决方案.但我相信它可以被清理和构建,以制作出整洁且运行良好的东西.
Here is a VERY quick (have to run) and dirty solution with data.table
. But I believe it can be cleaned and built upon to make something that is neat and works well.
# Load data.table
require(data.table)
setDT(df)
# Format all columns as as numeric,
# otherwise mean is not meaningful (see what I did there?)
x.cols <- paste("X", 1:5, sep = "")
y.cols <- paste("Y", 1:5, sep = "")
setDT(df)[, (x.cols) := lapply(.SD, as.integer), .SDcols = x.cols]
setDT(df)[, (y.cols) := lapply(.SD, as.integer), .SDcols = y.cols]
# meanX first mean, and then NA
df[, meanX := mean(c(X1, X2, X3, X4, X5), na.rm = TRUE), by =ID]
df[df[, sum(is.na(c(X1, X2, X3, X4, X5))) > 2, by = ID]$V1, meanX := NA]
# meanY first mean, and then NA
df[, meanY := mean(c(Y1, Y2, Y3, Y4, Y5), na.rm = TRUE), by =ID]
df[df[, sum(is.na(c(Y1, Y2, Y3, Y4, Y5))) > 2, by = ID]$V1, meanY := NA]
# Result
df
ID X1 X2 X3 X4 X5 Y1 Y2 Y3 Y4 Y5 meanX meanY
1: 1 1 1 1 NA 1 1 1 1 NA 1 1.000000 1.000000
2: 2 1 2 NA NA 1 1 2 NA NA 1 1.333333 1.333333
3: 3 1 3 2 NA 1 1 3 2 NA 1 1.750000 1.750000
4: 4 1 4 NA NA 1 1 4 NA NA 1 2.000000 2.000000
5: 5 1 5 NA NA 1 1 5 NA NA 1 2.333333 2.333333
6: 6 2 6 1 NA NA 2 6 1 NA NA 3.000000 3.000000
7: 7 2 7 NA NA NA 2 7 NA NA NA NA NA
8: 8 2 8 2 NA NA 2 8 2 NA NA 4.000000 4.000000
9: 9 2 9 NA NA NA 2 9 NA NA NA NA NA
10: 10 2 10 NA NA NA 2 10 NA NA NA NA NA
这篇关于在 R 中计算带条件的乘法列的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文