计算后将因子均值获取到数据集中 [英] Getting Factor Means into the dataset after calculation
问题描述
我正在尝试根据各个会议方式和SD为我正在使用的变量创建一个归一化值.我发现使用功能的会议手段:
I am trying to create a normalization value for a variable I am working with based on individual conference means and SDs. I found the conference means using the function:
confavg=aggregate(base$AVG, by=list(base$confName), FUN=mean)
因此,在获得31个会议的均值后,我想返回并为每个参与者输入这些均值,以便我可以轻松地基于会议均值计算归一化因子.
And so after getting the means for the 31 conferences, I want to go back and for each individual player put these means in so I can easily calculate a normalization factor based on the conference mean.
我试图创建大型ifelse或if语句,其中confavg是会议的平均值.
I have tried to create large ifelse or if statements where confavg is the conference average.
ifelse((base$confName=="America East Conference"),confavg[1,2]->base$CAVG,0->base$CAVG)
但没有任何效果.理想情况下,我想带每个玩家说:
but nothing works. Ideally I would want to take every player and say:
Normalization = (player average - conference average)/conference standard deviation
我应该怎么做?
以下是一些示例数据:
AVG = c(.350,.400,.320,.220,.100,.250,.400,.450)
Conf = c("SEC","ACC","SEC","B12","P12","ACC","B12","P12")
Conf=as.factor(Conf)
sampleconfavg=aggregate(AVG, by=list(Conf), FUN=mean)
sampleconfsd=aggregate(AVG, by=list(Conf), FUN=sd)
所以每个玩家都有自己的平均值-会议的平均会议数/标准差
So each player would have their average - the conference average / sd of conference
所以对于第一个家伙,它将是:
so for the first guy it would be:
(.350 - .335) / 0.0212132 = 0.7071069
但是我希望构建一个功能来为数据集中的所有人做到这一点.谢谢!
but I am hoping to build a function that does it for all people in my dataset. Thank you!
edit2:
好的,下面的答案是惊人的,但是(希望)我遇到了最后一个问题.我想基本上对以下三个变量执行此过程:
Alright the answer below is amazing but I am running into (hopefully) one last problem. I want to basically do this process to three variables like:
base3=do.call(rbind, by(base3, base3$confName, FUN=function(x) { x$ScaledAVG <- scale(x$AVG); x}))
base3=do.call(rbind, by(base3, base3$confName, FUN=function(x) { x$ScaledOBP <- scale(x$OBP); x}))
base3=do.call(rbind, by(base3, base3$confName, FUN=function(x) { x$ScaledK.AB <- scale(x$K.AB); x}))
这可以工作,但是当我搜索数据文件时像这样:
Which works but then when I search the datafile like:
base3[((base3$ScaledAVG>2)&(base3$ScaledOBP>2)&(base3$ScaledK.AB<.20)),]
它重置Scaled K.AB值,并且不将其用作搜索参数的一部分.
it resets the Scaled K.AB value and doesn't use it as part of the parameters of the search.
推荐答案
下面是在iris $ Species组中缩放iris $ Sepal.Length的示例:
Here is an example to scale iris$Sepal.Length, within groups of iris$Species:
scaled.iris <- do.call(rbind,
by(iris, iris$Species,
FUN=function(x) { x$Scaled.Sepal.Length <- scale(x$Sepal.Length); x }
)
)
head(scaled.iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Scaled.Sepal.Length
## setosa.1 5.1 3.5 1.4 0.2 setosa 0.26667447
## setosa.2 4.9 3.0 1.4 0.2 setosa -0.30071802
## setosa.3 4.7 3.2 1.3 0.2 setosa -0.86811050
## setosa.4 4.6 3.1 1.5 0.2 setosa -1.15180675
## setosa.5 5.0 3.6 1.4 0.2 setosa -0.01702177
## setosa.6 5.4 3.9 1.7 0.4 setosa 1.11776320
使用示例数据(仅Conf
和AVG
):
Using your sample data (Conf
and AVG
only):
d <- data.frame(Conf, AVG)
dd <- do.call(rbind, by(d, d$Conf, FUN=function(x) { x$Scaled <- scale(x$AVG); x}))
# Remove generated row names
rownames(dd) <- NULL
dd
## Conf AVG Scaled
## 1 ACC 0.40 0.7071068
## 2 ACC 0.25 -0.7071068
## 3 B12 0.22 -0.7071068
## 4 B12 0.40 0.7071068
## 5 P12 0.10 -0.7071068
## 6 P12 0.45 0.7071068
## 7 SEC 0.35 0.7071068
## 8 SEC 0.32 -0.7071068
这篇关于计算后将因子均值获取到数据集中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!