使用do.call因子进行缩放-重置值错误 [英] Using do.call factor to scale - resetting value error

查看:98
本文介绍了使用do.call因子进行缩放-重置值错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我在这里提出的问题的扩展: 在计算后将因子均值获取到数据集中

This is an extension of the question that I asked here: Getting Factor Means into the dataset after calculation

现在,我基本上已经对我感兴趣的所有统计数据进行了归一化 我想在数据集中搜索与此相交的人.因此,我正在像这样搜索数据集:

Now that I have basically normalized all of the stats that I am interested in using I want to search the data set for people that intersect with these. Thus I am searching the dataset like this:

base3[((base3$ScaledAVG>2)&(base3$ScaledOBP>2)&(base3$ScaledK.AB<.20)),]

寻找具有所有这三个条件的球员,但是当我运行此命令时,它将Scaled K.AB值重置为.5、1或2,然后不使用该参数进行搜索.以这种方式搜索数据集是否存在问题,还是有一种更好的方法来以同样的方式在数据集中寻找人?

looking for the players that have all three of those things true, yet when I run this it resets the Scaled K.AB value to either .5, 1 or 2 and then doesn't search using that parameter. Is there something wrong with searching the data set this way or is there a better way to find people in a dataset in this same vein?

这里有一些示例数据,但是与我查看4000条记录时所遇到的问题不同:

Here is some sample data but it doesn't have the same problems as when I go out to the 4000 records I have:

AVG = c(.350,.400,.320,.220,.100,.250,.400,.450)
Conf = c("SEC","ACC","SEC","B12","P12","ACC","B12","P12")
OBP = c(.360,.420,.360,.260,.160,.260,.460,.410)
K.AB = c(.11,.10,.09,.25,.20,.19,.05,.09)
Conf=as.factor(Conf)
d<- data.frame(Conf, AVG,OBP,K.AB)
dd <- do.call(rbind, by(d, d$Conf, FUN=function(x) { x$Scaled <- scale(x$AVG); x}))
dd <- do.call(rbind, by(d, d$Conf, FUN=function(x) { x$Scaled <- scale(x$OBP); x}))
dd <- do.call(rbind, by(d, d$Conf, FUN=function(x) { x$Scaled <- scale(x$K.AB); x}))
dd[((dd$ScaledAVG>2)&(dd$ScaledOBP>2)&(dd$ScaledK.AB<.20)),]

谢谢!

推荐答案

您可能希望放弃do.call(rbind,by(...))策略,转而采用直接的scale策略. scale function has a data.frame`方法.

You may want to drop the do.call(rbind, by(...)) strategy in favor of a straight scale strategy. The scale function has adata.frame` method.

> dd <- scale(d[ ,c("AVG", "OBP", "K.AB")])
> dd
             AVG        OBP       K.AB
[1,]  0.33566727  0.2348519 -0.3608439
[2,]  0.76878633  0.8281619 -0.5051815
[3,]  0.07579584  0.2348519 -0.6495191
[4,] -0.79044229 -0.7539981  1.6598820
[5,] -1.82992803 -1.7428481  0.9381942
[6,] -0.53057085 -0.7539981  0.7938566
[7,]  0.76878633  1.2237019 -1.2268693
[8,]  1.20190539  0.7292769 -0.6495191
attr(,"scaled:center")
    AVG     OBP    K.AB 
0.31125 0.33625 0.13500 
attr(,"scaled:scale")
       AVG        OBP       K.AB 
0.11544170 0.10112757 0.06928203 

> d[ dd[, 'AVG'] > 2 & dd[ ,'OBP'] >2 & dd[ ,'K.AB'] < 0.2 , ]
[1] Conf AVG  OBP  K.AB
<0 rows> (or 0-length row.names)

您不会得到满足所有这些条件的行也就不足为奇了,因为在小型数据集中不太可能将缩放值设为2.

It should not be too surprising that you get no rows that meet all of those conditions since a scaled value of 2 is rather unlikely in a small dataset.

要在Conf级范围内应用比例尺,

To apply scale within levels of Conf:

> dd <- lapply(d[ ,c("AVG", "OBP", "K.AB")], function(x) ave(x, d[,"Conf"] , FUN=scale) )
> dd
$AVG
[1]  0.7071068  0.7071068 -0.7071068 -0.7071068 -0.7071068 -0.7071068  0.7071068  0.7071068

$OBP
[1]        NaN  0.7071068        NaN -0.7071068 -0.7071068 -0.7071068  0.7071068  0.7071068

$K.AB
[1]  0.7071068 -0.7071068 -0.7071068  0.7071068  0.7071068  0.7071068 -0.7071068 -0.7071068

> data.frame(dd)
         AVG        OBP       K.AB
1  0.7071068        NaN  0.7071068
2  0.7071068  0.7071068 -0.7071068
3 -0.7071068        NaN -0.7071068
4 -0.7071068 -0.7071068  0.7071068
5 -0.7071068 -0.7071068  0.7071068
6 -0.7071068 -0.7071068  0.7071068
7  0.7071068  0.7071068 -0.7071068
8  0.7071068  0.7071068 -0.7071068

由于提供的测试用例太小,我认为它在这里不能很好地工作.

I do not think it works too well here because the offered test case is too small.

这篇关于使用do.call因子进行缩放-重置值错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆