在R中找到Box-Cox变换的最佳Lambda [英] Finding Optimal Lambda for Box-Cox Transform in R

查看:958
本文介绍了在R中找到Box-Cox变换的最佳Lambda的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将数据转换为R中的向量.

I am trying to transform data in a vector in R.

这不是用于线性回归的,所以我没有预测变量和响应的关系.我只是在使用一个通过标准化数据来提高准确性的模型. (因此,我不能使用boxcox函数,因为它仅适用于线性模型).

This is not for linear regression so I don't have a predictor and response relationship. I am simply using a model that will improve accuracy by normalizing my data. (hence I can't use the boxcox function since it only works with linear models).

我要转换的数据是:

vect
 [1]  99.64  49.71 246.84  96.17  16.67 352.00 421.25  81.77 105.00  37.85

我查看了此帖子.

尚不清楚正在执行什么操作以及如何使用优化函数,但我确实设法修改了该函数以创建一个我想最小化的函数.

It was not clear on what was being done and how the optimize function is being used but I did manage to modify the function to create a function that I would like to minimize.

xskew <- function(data,par) {
abs(skewness((data^par-1)/par)) }

我想输入一个lambda值序列(也许在0.5到1之间,跳转为0.01),然后找出其中一个值最小化我的数据集的xskew.

I would like to input a sequence of values for lambda (perhaps between 0.5 and 1 with jumps of 0.01) and find which one of those values minimizes xskew for my dataset.

我尝试使用优化函数来执行此操作,但是没有运气,因此我认为这可能不是适合我的函数. 如何执行此计算?

I have tried to do this with the optim function but with no luck so I don't think this might be the right function for me. How do I perform this calculation?

我想要一些类似的东西:

edit: I would like something along the lines of:

 x <- seq(0.51,0.99,by=0.01)
 which(xskew(vect,x) < 0.05)

所以也许我会找到一个低于某个阈值的值.这段代码显然会产生错误.

So perhaps I would find a value under some threshold. This code obviously produces an error.

推荐答案

请注意,y~1在R中算作线性模型,因此可以使用MASS中的boxcox函数:

Note that y~1 counts as a linear model in R, so you can use the boxcox function from MASS:

tmp <- exp(rnorm(10))
out <- boxcox(lm(tmp~1))
range(out$x[out$y > max(out$y)-qchisq(0.95,1)/2])

我认为该功能最重要的部分不是找到最佳" lambda,而是找到lambda的置信区间,然后鼓励您考虑不同的转换的含义并将其与数据背后的科学.如果您的数据的最佳" lambda是0.41,但是间隔包含0.5,并且有科学的理由为什么平方根变换有意义,那么为什么使用0.41而不是0.5?

I think that the most important part of that function is not that it finds a "best" lambda, but that it finds the confidence interval for lambda, then encourages you to think about what the different transformations mean and combine that with the science behind the data. If the "best" lambda for your data is 0.41, but the interval contains 0.5 and there is scientific reasoning why a square root transform makes sense, then why use 0.41 instead of 0.5?

这篇关于在R中找到Box-Cox变换的最佳Lambda的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆