R中的非线性最小二乘法内的样条线 [英] Splines inside nonlinear least squares in R

查看:93
本文介绍了R中的非线性最小二乘法内的样条线的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑R中的非线性最小二乘模型,例如以下形式):

 y ~ theta / ( 1 + exp( -( alpha + beta * x) ) )

(我的实际问题有几个变量,外部函数不是逻辑的,但涉及更多;这比较简单,但我认为如果能做到,我的情况应立即跟进)

我想用(例如)自然三次样条代替术语"alpha + beta * x".

下面是一些代码,可用于在物流中创建带有非线性函数的示例数据:

set.seed(438572L)
x <- seq(1,10,by=.25)
y <- 8.6/(1+exp( -(-3+x/4.4+sqrt(x*1.1)*(1.-sin(1.+x/2.9))) )) + rnorm(x, s=0.2 )

如果不需要逻辑处理,那么如果我在lm中,我可以轻松地将线性项替换为样条项;像这样的线性模型:

 lm( y ~ x ) 

然后成为

 library("splines")
 lm( y ~ ns( x, df = 5 ) )

生成拟合值很简单,并借助(for 例如)rms软件包似乎很简单.

实际上,使用基于lm的样条拟合拟合原始数据还不错,但这是有原因的,我需要在logistic函数中使用它(或者在我的问题中等效).

nls的问题是我需要为所有参数提供名称(我很高兴称它们为一个样条曲线拟合(b1,...,b5)(并说c1,...,c6)另一个变量-我需要能够制作多个变量.

是否有合理整齐的方法来生成nls的相应公式,以便可以用样条曲线代替非线性函数中的线性项?

我认为可以做到这一点的唯一方法是有点笨拙和笨拙,并且如果不编写一堆完整的代码就不能很好地概括.

(澄清说明)对于这个小问题,我当然可以手工解决-为 ns生成的矩阵中每个变量的内积写一个表达式. em>,乘以参数向量.但是,然后我必须为每个其他样条中的每个样条项逐项写出完整的内容,并且每次我更改任何样条线中的df时都要重新写一次,如果要使用cs而不是ns,则必须再次进行写.然后,当我想尝试进行一些预测(/插值)时,我们会遇到一系列全新的问题.我需要一遍又一遍地继续做下去,并可能要进行大量的打结和几个变量,以便在分析之后进行分析-我想知道是否有比写出每个单独的术语更整齐,更简单的方法,无需编写大量代码.我可以看到这样做的门相当公道,其中涉及很多代码才能正确执行,但是作为R,我怀疑有一种更整洁的方法(或更可能是3或4种更整洁的方法)简直是在躲避我.因此是问题.

我以为我曾经见过有人以相当不错的方式做这样的事情, 但是对于我的一生,我现在找不到了;我已经尝试了很多次才能找到它.

[更具体地说,我通常希望能够尝试拟合每个变量中几个不同样条中的任何一个-尝试几种可能性-以查看是否可以找到一个简单的模型,但仍然可以适合度足以达到目标的位置(噪声确实非常低;适合度有一些偏差可以达到很好的平滑效果,但只能达到一定程度).比任何一种进行推理和数据挖掘都不是真正要解决此问题的方法,它更找到了一个不错的,可解释的但足够的拟合函数".

或者,如果用gnm或ASSIST或其他软件包中的一个要容易得多,那将是有用的知识,但是然后有一些有关如何使用它们继续解决玩具问题的指导会有所帮助.

解决方案

ns实际上生成预测变量矩阵.您可以做的是将该矩阵拆分为各个变量,然后将其输入到nls.

m <- ns(x, df=5)
df <- data.frame(y, m)  # X-variables will be named X1, ... X5
# starting values should be set as appropriate for your data
nls(y ~ theta * plogis(alpha + b1*X1 + b2*X2 + b3*X3 + b4*X4 + b5*X5), data=df,
        start=list(theta=1, alpha=0, b1=1, b2=1, b3=1, b4=1, b5=1))

ETA:这是针对不同df值自动执行此操作的方法.这将使用文本修饰来构造公式,然后使用do.call调用nls.警告:未经测试.

my.nls <- function(x, y, df)
{
    m <- ns(x, df=df)
    xn <- colnames(m)
    b <- paste("b", seq_along(xn), sep="")
    fm <- formula(paste("y ~ theta * plogis(1 + alpha + ", paste(b, xn, sep="*",
          collapse=" + "), ")", sep=""))
    start <- c(1, 1, rep(1, length=length(b)))
    names(start) <- c("theta", "alpha", b)
    do.call(nls, list(fm, data=data.frame(y, m), start=start))
}

Consider a nonlinear least squares model in R, for example of the following form):

 y ~ theta / ( 1 + exp( -( alpha + beta * x) ) )

(my real problem has several variables and the outer function is not logistic but a bit more involved; this one is simpler but I think if I can do this my case should follow almost immediately)

I'd like to replace the term "alpha + beta * x" with (say) a natural cubic spline.

here's some code to create some example data with a nonlinear function inside the logistic:

set.seed(438572L)
x <- seq(1,10,by=.25)
y <- 8.6/(1+exp( -(-3+x/4.4+sqrt(x*1.1)*(1.-sin(1.+x/2.9))) )) + rnorm(x, s=0.2 )

Without the need for a logistic around it, if I was in lm, I could replace a linear term with a spline term easily; so a linear model something like this:

 lm( y ~ x ) 

then becomes

 library("splines")
 lm( y ~ ns( x, df = 5 ) )

generating fitted values is simple and getting predicted values with the aid of (for example) the rms package seems simple enough.

Indeed, fitting the original data with that lm-based spline fit isn't too bad, but there's a reason I need it inside the logistic function (or rather, the equivalent in my problem).

The problem with nls is I need to provide names for all the parameters (I'm quite happy with calling them say (b1, ..., b5) for one spline fit (and say c1, ... , c6 for another variable - I'll need to be able to make several of them).

Is there a reasonably neat way to generate the corresponding formula for nls so that I can replace the linear term inside the nonlinear function with a spline?

The only ways I can figure that there could be to do it are a bit awkward and clunky and don't nicely generalize without writing a whole bunch of code.

(edit for clarification) For this small problem, I can do it by hand of course - write out an expression for inner product of every variable in the matrix generated by ns, times the vector of parameters. But then I have to write the whole thing out term-by-term again for each spline in every other variable, and again every time I change the df in any of the splines, and again if I want to use cs instead of ns. And then when I want to try to do some prediction(/interpolation), we get a whole new slew of issues to be dealt with. I need to keep doing it, over and over, and potentially for a substantially larger number of knots, and over several variables, for analysis after analysis - and I wondered if there was a more neat, simple way than writing out each individual term, without having to write a great deal of code. I can see a fairly bull-at-a-gate way to do it that would involve a fair bit of code to get right, but being R, I suspect there's a much neater way (or more likely 3 or 4 neater ways) that's simply eluding me. Hence the question.

I thought I had seen someone do something like this in the past in a fairly nice way, but for the life of me I can't find it now; I've tried a bunch of times to locate it.

[More particularly, I'd generally like to be able to try the fit any of several different splines in each variable - to try a couple of possibilities - in order to see if I could find a simple model, but still one where the fit is adequate for the purpose (noise is really quite low; some bias in the fit is okay to achieve a nice smooth result, but only up to a point). It's more 'find a nice, interpretable, but adequate fitting function' than anything approaching inference and data mining isn't really an issue for this problem.]

Alternatively, if this would be much easier in say gnm or ASSIST or one of the other packages, that would be useful knowledge, but then some pointers on how to proceed on the toy problem above with them would help.

解决方案

ns actually generates a matrix of predictors. What you can do is split that matrix out into individual variables, and feed them to nls.

m <- ns(x, df=5)
df <- data.frame(y, m)  # X-variables will be named X1, ... X5
# starting values should be set as appropriate for your data
nls(y ~ theta * plogis(alpha + b1*X1 + b2*X2 + b3*X3 + b4*X4 + b5*X5), data=df,
        start=list(theta=1, alpha=0, b1=1, b2=1, b3=1, b4=1, b5=1))

ETA: here's a go at automating this for different values of df. This constructs the formula using text munging, and then uses do.call to call nls. Caveat: untested.

my.nls <- function(x, y, df)
{
    m <- ns(x, df=df)
    xn <- colnames(m)
    b <- paste("b", seq_along(xn), sep="")
    fm <- formula(paste("y ~ theta * plogis(1 + alpha + ", paste(b, xn, sep="*",
          collapse=" + "), ")", sep=""))
    start <- c(1, 1, rep(1, length=length(b)))
    names(start) <- c("theta", "alpha", b)
    do.call(nls, list(fm, data=data.frame(y, m), start=start))
}

这篇关于R中的非线性最小二乘法内的样条线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆