使用scipy,numpy,python等进行S型回归 [英] sigmoidal regression with scipy, numpy, python, etc

查看:86
本文介绍了使用scipy,numpy,python等进行S型回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个变量(x和y)彼此之间具有某种S型关系,并且我需要找到某种预测方程,使我能够在给定x的任何值的情况下预测y的值.我的预测方程式需要显示两个变量之间的某种S形关系.因此,我无法解决产生一条线的线性回归方程.我需要看到在两个变量的图的左右两侧都发生了斜率的逐渐曲线变化.

I have two variables (x and y) that have a somewhat sigmoidal relationship with each other, and I need to find some sort of prediction equation that will enable me to predict the value of y, given any value of x. My prediction equation needs to show the somewhat sigmoidal relationship between the two variables. Therefore, I cannot settle for a linear regression equation that produces a line. I need to see the gradual, curvilinear change in slope that occurs at both the right and left of the graph of the two variables.

在搜索曲线回归和python之后,我开始使用numpy.polyfit,但这给了我可怕的结果,如果您运行下面的代码,您可以看到. 谁能告诉我如何重新编写以下代码以获得所需的S型回归方程式?

I started using numpy.polyfit after googling curvilinear regression and python, but that gave me the awful results you can see if you run the code below. Can anyone show me how to re-write the code below to get the type of sigmoidal regression equation that I want?

如果运行下面的代码,您会看到它给出了一个向下的抛物线,这与我的变量之间的关系应该不是这样.相反,我的两个变量之间应该有更多的S型关系,但与下面代码中使用的数据紧密匹配.以下代码中的数据是来自大样本研究的数据,因此它们具有比其五个数据点所暗示的更多的统计能力.我没有来自大样本研究的实际数据,但是我有以下平均值和它们的标准偏差(我没有显示).我宁愿只用下面列出的均值数据绘制一个简单的函数,但是如果复杂度可以带来实质性的改善,则代码可能会变得更加复杂.

If you run the code below, you can see that it gives a downward facing parabola, which is not what the relationship between my variables should look like. Instead, there should be more of a sigmoidal relationship between my two variables, but with a tight fit with the data that I am using in the code below. The data in the code below are means from a large-sample research study, so they pack more statistical power than their five data points might suggest. I do not have the actual data from the large-sample research study, but I do have the means below and their standard deviations(which I am not showing). I would prefer to just plot a simple function with the mean data listed below, but the code could get more complex if complexity would offer substantial improvements.

如何更改代码以显示最合适的S型函数,最好使用scipy,numpy和python? ,需要对其进行修复:

How can I change my code to show a best fit of a sigmoidal function, preferably using scipy, numpy, and python? Here is the current version of my code, which needs to be fixed:

import numpy as np
import matplotlib.pyplot as plt

# Create numpy data arrays
x = np.array([821,576,473,377,326])
y = np.array([255,235,208,166,157])

# Use polyfit and poly1d to create the regression equation
z = np.polyfit(x, y, 3)
p = np.poly1d(z)
xp = np.linspace(100, 1600, 1500)
pxp=p(xp)

# Plot the results
plt.plot(x, y, '.', xp, pxp, '-')
plt.ylim(140,310)
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True)
plt.show()


编辑如下:(重新设计问题)

您的响应及其速度令人印象深刻.谢谢你,乌努布. 但是,为了产生更有效的结果,我需要重新构造数据值.这意味着将x值重新铸造为最大x值的百分比,同时将y值重新铸造为原始数据中的x值的百分比.我尝试使用您的代码执行此操作,并提出了以下建议:


EDIT BELOW: (Re-framed the question)

Your response, and its speed, are very impressive. Thank you, unutbu. But, in order to produce more valid results, I need to re-frame my data values. This means re-casting x values as a percentage of the max x value, while re-casting y values as a percentage of the x-values in the original data. I tried to do this with your code, and came up with the following:

import numpy as np 
import matplotlib.pyplot as plt 
import scipy.optimize 

# Create numpy data arrays 
'''
# Comment out original data
#x = np.array([821,576,473,377,326]) 
#y = np.array([255,235,208,166,157]) 
'''

# Re-calculate x values as a percentage of the first (maximum)
# original x value above
x = np.array([1.000,0.702,0.576,0.459,0.397])

# Recalculate y values as a percentage of their respective x values
# from original data above
y = np.array([0.311,0.408,0.440,0.440,0.482])

def sigmoid(p,x): 
    x0,y0,c,k=p 
    y = c / (1 + np.exp(-k*(x-x0))) + y0 
    return y 

def residuals(p,x,y): 
    return y - sigmoid(p,x) 

p_guess=(600,200,100,0.01) 
(p,  
 cov,  
 infodict,  
 mesg,  
 ier)=scipy.optimize.leastsq(residuals,p_guess,args=(x,y),full_output=1,warning=True)  

'''
# comment out original xp to allow for better scaling of
# new values
#xp = np.linspace(100, 1600, 1500) 
'''

xp = np.linspace(0, 1.1, 1100) 
pxp=sigmoid(p,xp) 

x0,y0,c,k=p 
print('''\ 
x0 = {x0}
y0 = {y0}
c = {c}
k = {k}
'''.format(x0=x0,y0=y0,c=c,k=k)) 

# Plot the results 
plt.plot(x, y, '.', xp, pxp, '-') 
plt.ylim(0,1) 
plt.xlabel('x') 
plt.ylabel('y') 
plt.grid(True) 
plt.show()

您能告诉我如何修改此修订后的代码吗?
注意:通过重新浇铸数据,我实际上已将2d(x,y)乙状结肠绕z轴旋转了180度.另外,1.000并不是x值的最大值.取而代之的是,1.000是在最大测试条件下不同测试参与者的值范围的平均值.

谢谢,ubuntu.我仔细阅读了您的代码,并在scipy文档中查找了代码的各个方面.由于您的名字似乎是作为scipy文档的作者而出现的,因此我希望您可以回答以下问题:

Thank you, ubuntu. I carefully read through your code and looked aspects of it up in the scipy documentation. Since your name seems to pop up as a writer of the scipy documentation, I am hoping you can answer the following questions:

1.)Minimumsq()是否调用残差(),然后该残差返回输入的y矢量和sigmoid()函数返回的y矢量之间的差?如果是这样,它如何解决输入y矢量和sigmoid()函数返回的y矢量的长度差异?

1.) Does leastsq() call residuals(), which then returns the difference between the input y-vector and the y-vector returned by the sigmoid() function? If so, how does it account for the difference in the lengths of the input y-vector and the y-vector returned by the sigmoid() function?

2.)看起来只要可以通过残差函数(该函数又调用数学函数)访问该数学方程,就可以为任何数学方程式调用minimumsq().这是真的吗?

2.) It looks like I can call leastsq() for any math equation, as long as I access that math equation through a residuals function, which in turn calls the math function. Is this true?

3.)另外,我注意到p_guess与p具有相同数量的元素.这是否意味着p_guess的四个元素分别分别与x0,y0,c和k返回的值相对应?

3.) Also, I notice that p_guess has the same number of elements as p. Does this mean that the four elements of p_guess correspond in order, respectively, with the values returned by x0,y0,c, and k?

4.)是否将作为参数发送给残差()和sigmoid()函数的p与将由minimumsq()输出的p相同,并且minimumsq()函数在返回之前在内部使用该p它吗?

4.) Is the p that is sent as an argument to the residuals() and sigmoid() functions the same p that will be output by leastsq(), and the leastsq() function is using that p internally before returning it?

5.)只要p中的元素数量等于p_guess中的元素数量,p和p_guess可以具有任意数量的元素,取决于用作模型的方程的复杂性?

5.) Can p and p_guess have any number of elements, depending on the complexity of the equation being used as a model, as long as the number of elements in p is equal to the number of elements in p_guess?

推荐答案

使用收益

具有S型参数

x0 = 0.826964424481
y0 = 0.151506745435
c = 0.848564826467
k = -9.54442292022

请注意,对于较新版本的scipy(例如0.9),还有此处.

Note that for newer versions of scipy (e.g. 0.9) there is also the scipy.optimize.curve_fit function which is easier to use than leastsq. A relevant discussion of fitting sigmoids using curve_fit can be found here.

添加了resize函数,以便可以重新缩放原始数据并移动原始数据以适合任何所需的边界框.

A resize function was added so that the raw data could be rescaled and shifted to fit any desired bounding box.

您的名字似乎冒充作家 scipy文档"

"your name seems to pop up as a writer of the scipy documentation"

免责声明:我不是scipy文档的作者.我只是一个用户,而且还是一个新手.我对leastsq的了解大部分来自阅读本教程,由Travis Oliphant撰写.

DISCLAIMER: I am not a writer of scipy documentation. I am just a user, and a novice at that. Much of what I know about leastsq comes from reading this tutorial, written by Travis Oliphant.

1.)Minimumsq()是否调用残差(),然后返回差值 在输入y矢量和 sigmoid()返回的y向量 功能?

1.) Does leastsq() call residuals(), which then returns the difference between the input y-vector and the y-vector returned by the sigmoid() function?

是的!完全是

如果是这样,它如何解释 输入长度的差异 y向量和y向量返回的 sigmoid()函数?

If so, how does it account for the difference in the lengths of the input y-vector and the y-vector returned by the sigmoid() function?

长度相同:

In [138]: x
Out[138]: array([821, 576, 473, 377, 326])

In [139]: y
Out[139]: array([255, 235, 208, 166, 157])

In [140]: p=(600,200,100,0.01)

In [141]: sigmoid(p,x)
Out[141]: 
array([ 290.11439268,  244.02863507,  221.92572521,  209.7088641 ,
        206.06539033])

关于Numpy的一件奇妙的事情是,它允许您编写对整个数组进行运算的向量"方程.

One of the wonderful things about Numpy is that it allows you to write "vector" equations that operate on entire arrays.

y = c / (1 + np.exp(-k*(x-x0))) + y0

可能看起来像在浮点数上工作(实际上是这样),但是如果将x设为numpy数组,并且将ckx0y0设为浮点数,则等式将定义是与x形状相同的numpy数组.因此,sigmoid(p,x)返回一个numpy数组.在 numpybook 中有更完整的解释(必读numpy的忠实用户).

might look like it works on floats (indeed it would) but if you make x a numpy array, and c,k,x0,y0 floats, then the equation defines y to be a numpy array of the same shape as x. So sigmoid(p,x) returns a numpy array. There is a more complete explanation of how this works in the numpybook (required reading for serious users of numpy).

2.)看起来我可以为任何数学方程式调用minimumsq(),只要我 通过一个访问该数学方程式 残差函数 调用数学函数.这是真的吗?

2.) It looks like I can call leastsq() for any math equation, as long as I access that math equation through a residuals function, which in turn calls the math function. Is this true?

是的. leastsq尝试最小化残差(差)的平方和.它搜索参数空间(p的所有可能值)以查找p,该值将平方和最小化.发送到residualsxy是您的原始数据值.它们是固定的.他们没有改变. leastsq试图最小化的是p s(S型函数中的参数).

True. leastsq attempts to minimize the sum of the squares of the residuals (differences). It searches the parameter-space (all possible values of p) looking for the p which minimizes that sum of squares. The x and y sent to residuals, are your raw data values. They are fixed. They don't change. It's the ps (the parameters in the sigmoid function) that leastsq tries to minimize.

3.)另外,我注意到p_guess与p具有相同数量的元素.做 这意味着 p_guess依次对应, 分别返回值 x0,y0,c和k?

3.) Also, I notice that p_guess has the same number of elements as p. Does this mean that the four elements of p_guess correspond in order, respectively, with the values returned by x0,y0,c, and k?

就是这样!像牛顿的方法一样,leastsq需要对p进行初始猜测.您将其提供为p_guess.当你看到

Exactly so! Like Newton's method, leastsq needs an initial guess for p. You supply it as p_guess. When you see

scipy.optimize.leastsq(residuals,p_guess,args=(x,y))

您可以认为,作为最低通过算法(实际上是Levenburg-Marquardt算法)的一部分,最低通过调用residuals(p_guess,x,y). 注意

you can think that as part of the leastsq algorithm (really the Levenburg-Marquardt algorithm) as a first pass, leastsq calls residuals(p_guess,x,y). Notice the visual similarity between

(residuals,p_guess,args=(x,y))

residuals(p_guess,x,y)

它可以帮助您记住leastsq的参数的顺序和含义.

It may help you remember the order and meaning of the arguments to leastsq.

residualssigmoid一样返回一个numpy数组.数组中的值被平方,然后求和.这是要击败的数字.然后p_guess变化,因为leastsq寻找一组将residuals(p_guess,x,y)最小化的值.

residuals, like sigmoid returns a numpy array. The values in the array are squared, and then summed. This is the number to beat. p_guess is then varied as leastsq looks for a set of values which minimizes residuals(p_guess,x,y).

4.)是将p作为参数发送给残差()和 sigmoid()的作用与 将由minimumsq()输出,并且 minimumsq()函数正在使用该p 在返回之前在内部?

4.) Is the p that is sent as an argument to the residuals() and sigmoid() functions the same p that will be output by leastsq(), and the leastsq() function is using that p internally before returning it?

嗯,不完全是.如您现在所知,随着leastsq搜索使residuals(p,x,y)最小化的p值,p_guess有所不同.发送到leastsqp(er,p_guess)具有与leastsq返回的p相同的形状.显然,除非您是个猜测家,否则值应该不同:)

Well, not exactly. As you know by now, p_guess is varied as leastsq searches for the p value that minimizes residuals(p,x,y). The p (er, p_guess) that is sent to leastsq has the same shape as the p that is returned by leastsq. Obviously the values should be different unless you are a hell of a guesser :)

5.)p和p_guess是否可以具有任意数量的元素,具体取决于 使用的方程的复杂度 作为模特,只要数量 p中的元素等于数量 p_guess中的元素是什么?

5.) Can p and p_guess have any number of elements, depending on the complexity of the equation being used as a model, as long as the number of elements in p is equal to the number of elements in p_guess?

是的.我没有对leastsq进行大量参数的压力测试,但这是一个非常强大的工具.

Yes. I haven't stress-tested leastsq for very large numbers of parameters, but it is a thrillingly powerful tool.

这篇关于使用scipy,numpy,python等进行S型回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆