R函数bs()的输出的解释(B样条基矩阵) [英] interpretation of the output of R function bs() (B-spline basis matrix)

查看:1264
本文介绍了R函数bs()的输出的解释(B样条基矩阵)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我经常使用B样条进行回归.到现在为止,我还不需要详细了解bs的输出:我只需选择我感兴趣的模型,然后将其与lm配合即可.但是,我现在需要在外部(非R)代码中重现b样条模型.那么,bs生成的矩阵的含义是什么?示例:

I often use B-splines for regression. Up to now I've never needed to understand the output of bs in detail: I would just choose the model I was interested in, and fit it with lm. However, I now need to reproduce a b-spline model in an external (non-R) code. So, what's the meaning of the matrix generated by bs? Example:

x <- c(0.0, 11.0, 17.9, 49.3, 77.4)
bs(x, df = 3, degree = 1) # generate degree 1 (linear) B-splines with 2 internal knots
#              1         2         3
# [1,] 0.0000000 0.0000000 0.0000000    
# [2,] 0.8270677 0.0000000 0.0000000    
# [3,] 0.8198433 0.1801567 0.0000000    
# [4,] 0.0000000 0.7286085 0.2713915    
# [5,] 0.0000000 0.0000000 1.0000000   
# attr(,"degree")
# [1] 1
# attr(,"knots")
# 33.33333% 66.66667% 
#  13.30000  38.83333 
# attr(,"Boundary.knots")
# [1]  0.0 77.4
# attr(,"intercept")
# [1] FALSE
# attr(,"class")
# [1] "bs"     "basis"  "matrix"

好,所以degree为1,正如我在输入中指定的那样. knots告诉我两个内部结分别位于x = 13.3000和x = 38.8333.看到结是固定的分位数时,我感到有些惊讶,我希望R可以为我的数据找到最佳的分位数,但这当然会使模型不线性,而且在不知道响应数据的情况下也是不可能的. intercept = FALSE表示基础中不包含任何截距(这是一件好事吗?我一直被教导不要在没有截距的情况下拟合线性模型……好吧,猜想lm还是会加一个).

Ok, so degree is 1, as I specified in input. knots is telling me that the two internal knots are at x = 13.3000 and x = 38.8333 respectively. Was a bit surprised to see that the knots are at fixed quantiles, I hoped R would find the best quantiles for my data, but of course that would make the model not linear, and also wouldn't be possible without knowing the response data. intercept = FALSE means that no intercept was included in the basis (is that a good thing? I've always being taught not to fit linear models without an intercept...well guess lm is just adding one anyway).

但是,矩阵呢?我不太了解如何解释它.对于三列,我认为这意味着基函数为三个.这是有道理的:如果我有两个内部结点K1K2,我将在左边界结点B1K1之间有一个样条线,在K1K2之间有另一个样条线,最后一个是在K2B2之间,所以...三个基本函数,确定.但是,究竟是哪些基函数?例如,此列是什么意思?

However, what about the matrix? I don't really understand how to interpret it. With three columns, I would think it means that the basis functions are three. This makes sense: if I have two internal knots K1 and K2, I will have a spline between left boundary knot B1 and K1, another spline between K1 and K2, and a final one between K2 and B2, so...three basis functions, ok. But which are the basis functions exactly? For example, what does this column mean?

#              1
# [1,] 0.0000000
# [2,] 0.8270677
# [3,] 0.8198433
# [4,] 0.0000000
# [5,] 0.0000000

这与此问题相似但不完全相同.该问题询问回归系数的解释,但在此之前我要迈出一步:我想了解模型矩阵系数的含义.如果我尝试按照第一个答案中的建议绘制相同的图,我会得到一个混乱的图:

this is similar to but not precisely the same as this question. That question asks about the interpretation of the regression coefficients, but I'm a step before that: I would like to understand the meaning of the model matrix coefficients. If I try to make the same plots as suggested in the first answer, I get a messed up plot:

b <- bs(x, df = 3, degree = 1)
b1 <- b[, 1]  ## basis 1
b2 <- b[, 2]  ## basis 2
b3 <- b[,3]
par(mfrow = c(1, 3))
plot(x, b1, type = "l", main = "basis 1: b1")
plot(x, b2, type = "l", main = "basis 2: b2")
plot(x, b3, type = "l", main = "basis 3: b3")

这些不能成为B样条基函数,因为它们的结数太多(每个函数只能有一个结).

These can't be the B-spline basis functions, because they have too many knots (each function should only have one).

第二个答案实际上将允许我在R之外重建模型,所以我想可以做到这一点.但是,这个答案也不能完全解释b矩阵的元素是什么:它处理线性回归的系数,我在这里还没有介绍.确实,这是我的最终目标,但我也想了解这一中间步骤.

The second answer would actually allow me to reconstruct my model outside R, so I guess I could go with that. However, also that answer doesn't exactly explains what the elements of the b matrix are: it deals with the coefficients of a linear regression, which I haven't still introduced here. It's true that that is my final goal, but I wanted to understand also this intermediate step.

推荐答案

矩阵b

#              1         2         3
# [1,] 0.0000000 0.0000000 0.0000000    
# [2,] 0.8270677 0.0000000 0.0000000    
# [3,] 0.8198433 0.1801567 0.0000000    
# [4,] 0.0000000 0.7286085 0.2713915    
# [5,] 0.0000000 0.0000000 1.0000000  

实际上只是x每个点上三个基函数值的矩阵,这对我来说应该是显而易见的,因为它与多项式线性模型的解释完全相同.实际上,由于边界结是

is actually just the matrix of the values of the three basis functions in each point of x, which should have been obvious to me since it's exactly the same interpretation as for a polynomial linear model. As a matter of fact, since the boundary knots are

bknots <- attr(b,"Boundary.knots")
# [1]  0.0 77.4

内部结是

iknots <- attr(b,"knots")
# 33.33333% 66.66667% 
#  13.30000  38.83333 

然后,此处所示的三个基本函数分别是:

then the three basis functions, as shown here, are:

knots <- c(bknots[1],iknots,bknots[2])
y1 <- c(0,1,0,0)
y2 <- c(0,0,1,0)
y3 <- c(0,0,0,1)
par(mfrow = c(1, 3))
plot(knots, y1, type = "l", main = "basis 1: b1")
plot(knots, y2, type = "l", main = "basis 2: b2")
plot(knots, b3, type = "l", main = "basis 3: b3")

现在,考虑b[,1]

#              1
# [1,] 0.0000000
# [2,] 0.8270677
# [3,] 0.8198433
# [4,] 0.0000000
# [5,] 0.0000000

这些必须是x <- c(0.0, 11.0, 17.9, 49.3, 77.4)b1的值.事实上,b1knots[1] = 0中为0,在knots[2] = 13.3000中为1,这意味着在x[2](11.0)中,值必须为11/13.3 = 0.8270677.同样,由于knots[3] = 38.83333b1为0,因此x[3](17.9)中的值必须为(38.83333-13.3)/17.9 = 0.8198433.由于x[4], x[5] > knots[3] = 38.83333,因此b1在此处为0.对于其他两列也可以给出类似的解释.

These must be the values of b1 in x <- c(0.0, 11.0, 17.9, 49.3, 77.4). As a matter of fact, b1 is 0 in knots[1] = 0 and 1 in knots[2] = 13.3000, meaning that in x[2] (11.0) the value must be 11/13.3 = 0.8270677, as expected. Similarly, since b1 is 0 for knots[3] = 38.83333, the value in x[3] (17.9) must be (38.83333-13.3)/17.9 = 0.8198433. Since x[4], x[5] > knots[3] = 38.83333, b1 is 0 there. A similar interpretation can be given for the other two columns.

这篇关于R函数bs()的输出的解释(B样条基矩阵)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆