R汇总折叠的数据表 [英] R Summarize Collapsed Data.Table
本文介绍了R汇总折叠的数据表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有这样的数据
data=data.table("School"=c(1,1,1,1,1,1,0,1,0,0,1,1,1,0,1,0,1,1,1,1,1,0,0,1,0,1,1,1,1,1,1,0,1,0,1,0),
"Grade"=c(0,1,1,1,0,0,0,1,1,1,0,1,1,0,0,1,1,1,0,0,1,1,0,1,0,0,1,0,1,1,0,0,0,0,1,0),
"CAT"=c(1,0,1,1,0,1,0,1,1,0,1,0,0,1,0,1,0,0,0,0,0,0,1,0,0,1,1,0,0,1,1,0,1,1,1,1),
"FOX"=c(1,1,0,1,1,1,1,1,0,0,0,1,1,1,0,0,1,1,1,1,1,1,1,0,1,1,0,0,1,0,0,1,0,0,1,0),
"DOG"=c(0,0,0,1,0,0,1,0,0,1,0,1,1,1,0,1,1,0,0,1,1,0,0,1,0,1,1,0,1,0,1,1,1,0,1,1))
,并希望建立一个新的数据表,例如:
and wish to achieve a new data table such as this:
dataWANT=data.frame("VARIABLE"=c('CAT', 'CAT', 'CAT', 'FOX', 'FOX', 'FOX', 'DOG', 'DOG', 'DOG'),
"SCHOOL"=c(1, 1, 0, 1, 1, 0, 1, 1, 0),
"GRADE"=c(0, 1, 1, 0, 1, 1, 0, 1, 1),
"MEAN"=c(NA))
dataWANT用等于1的SCHOOL,GRADE和SCHOOL X GRADE取CAT和FOX和DOG的平均值。
dataWANT takes the mean for CAT and FOX and DOG by SCHOOL, GRADE, and SCHOOL X GRADE when they are equal to 1.
我知道一次如何执行此操作,但这不利于处理大数据。
I know how to do this one at a time but that is not good for doing this with a big data.
data[, CAT1:=mean(CAT), by=list(SCHOOL)]
data[, FOX1:=mean(FOX), by=list(GRADE)]
data[, DOG1:=mean(DOG), by=list(SCHOOL, GRADE)]
data$CAT2 = unique(data[SCHOOL==1, CAT1])
data$FOX2 = unique(data[GRADE==1, FOX1])
data$DOG2 = unique(data[SCHOOL==1 & GRADE==1, DOG1])
仅使用此命令:
Please only use this:
data=data.table("SCHOOL"=c(1,1,1,1,1,1,0,1,0,0,1,1,1,0,1,0,1,1,1,1,1,0,0,1,0,1,1,1,1,1,1,0,1,0,1,0),
"GRADE"=c(0,1,1,1,0,0,0,1,1,1,0,1,1,0,0,1,1,1,0,0,1,1,0,1,0,0,1,0,1,1,0,0,0,0,1,0),
"CAT"=c(1,0,1,1,0,1,0,1,1,0,1,0,0,1,0,1,0,0,0,0,0,0,1,0,0,1,1,0,0,1,1,0,1,1,1,1),
"FOX"=c(1,0,0,1,1,1,1,1,0,0,0,1,1,1,0,0,1,1,1,1,1,1,1,0,1,1,0,0,1,0,0,1,0,0,1,0),
"DOG"=c(0,0,0,1,0,0,1,0,0,1,0,1,1,1,0,1,1,0,0,1,1,0,0,1,0,1,1,0,1,0,1,1,1,0,1,1))
data[, CAT1:=mean(CAT), by=list(SCHOOL)]
data[, CAT2:=mean(CAT), by=list(GRADE)]
data[, CAT3:=mean(CAT), by=list(SCHOOL, GRADE)]
data[, FOX1:=mean(FOX), by=list(SCHOOL)]
data[, FOX2:=mean(FOX), by=list(GRADE)]
data[, FOX3:=mean(FOX), by=list(SCHOOL, GRADE)]
data[, DOG1:=mean(DOG), by=list(SCHOOL)]
data[, DOG2:=mean(DOG), by=list(GRADE)]
data[, DOG3:=mean(DOG), by=list(SCHOOL, GRADE)]
dataWANT=data.frame("VARIABLE"=c('CAT','CAT','CAT','FOX','FOX','FOX','DOG','DOG','DOG'),
"TYPE"=c(1,2,3,1,2,3,1,2,3),
"MEAN"=c(0.48,0.44,0.428,0.6,0.611,0.6428,0.52,0.61,0.6428))
其中,当SCHOOL估计MEAN时,TYPE等于1,
where TYPE equals to 1 when MEAN in estimated by SCHOOL,
TYPE等于2,
TYPE等于3
推荐答案
我们可以在创建列表后使用
,方法是在 rbindlist
融化
数据集后获取 MEAN
(与另一篇文章一样)
We could use rbindlist
after creating a list
by taking the MEAN
after melt
ing the dataset (as in the other post)
library(data.table)
cols <- c('CAT', 'FOX', 'DOG')
data1 <- melt(data, measure.vars = cols)
list_cols <- list('SCHOOL', 'GRADE', c('SCHOOL', 'GRADE'))
lst1 <- lapply(list_cols, function(x)
data1[, .(MEAN = mean(value, na.rm = TRUE)), c(x, 'variable')])
rbindlist(lapply(lst1, function(x) {
nm1 <- setdiff(names(x), c('variable', 'MEAN'))
x[Reduce(`&`, lapply(mget(nm1), as.logical)),
.(VARIABLE = variable, MEAN)]}), idcol = 'TYPE')[order(VARIABLE)]
# TYPE VARIABLE MEAN
#1: 1 CAT 0.4800000
#2: 2 CAT 0.4444444
#3: 3 CAT 0.4285714
#4: 1 FOX 0.6000000
#5: 2 FOX 0.5555556
#6: 3 FOX 0.6428571
#7: 1 DOG 0.5200000
#8: 2 DOG 0.6111111
#9: 3 DOG 0.6428571
这篇关于R汇总折叠的数据表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文