如何在线性回归中强制零拦截? [英] How to force zero interception in linear regression?

查看:103
本文介绍了如何在线性回归中强制零拦截?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果这个问题已经得到解答,我有点抱歉,我已经看了一下,但是找不到我想要的东西.

I'm a bit of a newby so apologies if this question has already been answered, I've had a look and couldn't find specifically what I was looking for.

我有一些或多或少的线性数据,形式

I have some more or less linear data of the form

x = [0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0, 4.0, 6.0, 8.0, 10.0, 20.0, 40.0, 60.0, 80.0]
y = [0.50505332505407008, 1.1207373784533172, 2.1981844719020001, 3.1746209003398689, 4.2905482471260044, 6.2816226678076958, 11.073788414382639, 23.248479770546009, 32.120462301367183, 44.036117671229206, 54.009003143831116, 102.7077685684846, 185.72880217806673, 256.12183145545811, 301.97120103079675]

我正在使用scipy.optimize.leastsq对此进行线性回归:

I am using scipy.optimize.leastsq to fit a linear regression to this:

def lin_fit(x, y):
    '''Fits a linear fit of the form mx+b to the data'''
    fitfunc = lambda params, x: params[0] * x + params[1]    #create fitting function of form mx+b
    errfunc = lambda p, x, y: fitfunc(p, x) - y              #create error function for least squares fit

    init_a = 0.5                            #find initial value for a (gradient)
    init_b = min(y)                         #find initial value for b (y axis intersection)
    init_p = numpy.array((init_a, init_b))  #bundle initial values in initial parameters

    #calculate best fitting parameters (i.e. m and b) using the error function
    p1, success = scipy.optimize.leastsq(errfunc, init_p.copy(), args = (x, y))
    f = fitfunc(p1, x)          #create a fit with those parameters
    return p1, f    

它工作得很漂亮(尽管我不确定scipy.optimize是否适合在这里使用,但可能有点过头了?).

And it works beautifully (although I am not sure if scipy.optimize is the right thing to use here, it might be a bit over the top?).

但是,由于数据点的放置方式,它并没有在0处给出y轴截距.我确实知道,在这种情况下,它必须为零,if x = 0 than y = 0.

However, due to the way the data points lie it does not give me a y-axis interception at 0. I do know though that it has to be zero in this case, if x = 0 than y = 0.

有什么办法可以强制执行此操作吗?

Is there any way I can force this?

推荐答案

我不擅长这些模块,但是我有一些统计方面的经验,所以这里是我所看到的.您需要从以下位置更改健身功能

I am not adept at these modules, but I have some experience in statistics, so here is what I see. You need to change your fit function from

fitfunc = lambda params, x: params[0] * x + params[1]  

收件人:

fitfunc = lambda params, x: params[0] * x 

也删除该行:

init_b = min(y) 

并将下一行更改为:

init_p = numpy.array((init_a))

这应该摆脱产生y截距的第二个参数,并使拟合线通过原点.在其余的代码中,您可能还需要做几处较小的更改.

This should get rid of the second parameter that is producing the y-intercept and pass the fitted line through the origin. There might be a couple more minor alterations you might have to do for this in the rest of your code.

但是,是的,如果您像这样拔掉第二个参数,我不确定该模块是否可以工作.是否可以接受此修改取决于模块的内部工作方式.例如,我不知道params(参数列表)在哪里初始化,所以我不知道这样做是否会改变其长度.

But yes, I'm not sure if this module will work if you just pluck the second parameter away like this. It depends on the internal workings of the module as to whether it can accept this modification. For example, I don't know where params, the list of parameters, is being initialized, so I don't know if doing just this will change its length.

顺便说一句,正如您提到的那样,实际上,我认为这是优化斜率的一种过度方法.您可以稍微阅读一下线性回归并编写一些小代码,以便在经过一些信封式演算后自行完成.确实,这非常简单明了.实际上,我只是做了一些计算,我猜想优化的斜率就是<xy>/<x^2>,即x * y乘积的平均值除以x ^ 2的平均值.

And as an aside, since you mentioned, this I actually think is a bit of an over-the-top way to optimize just a slope. You could read up linear regression a little and write small code to do it yourself after some back-of-the envelope calculus. It's pretty simple and straightforward, really. In fact, I just did some calculations, and I guess the optimized slope will just be <xy>/<x^2>, i.e. the mean of x*y products divided by the mean of x^2's.

这篇关于如何在线性回归中强制零拦截?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆