R函数bs()的输出的解释(B样条基矩阵) [英] interpretation of the output of R function bs() (B-spline basis matrix)
问题描述
我经常使用B样条进行回归.到现在为止,我还不需要详细了解bs
的输出:我只需选择我感兴趣的模型,然后将其与lm
配合即可.但是,我现在需要在外部(非R)代码中重现b样条模型.那么,bs
生成的矩阵的含义是什么?示例:
I often use B-splines for regression. Up to now I've never needed to understand the output of bs
in detail: I would just choose the model I was interested in, and fit it with lm
. However, I now need to reproduce a b-spline model in an external (non-R) code. So, what's the meaning of the matrix generated by bs
? Example:
x <- c(0.0, 11.0, 17.9, 49.3, 77.4)
bs(x, df = 3, degree = 1) # generate degree 1 (linear) B-splines with 2 internal knots
# 1 2 3
# [1,] 0.0000000 0.0000000 0.0000000
# [2,] 0.8270677 0.0000000 0.0000000
# [3,] 0.8198433 0.1801567 0.0000000
# [4,] 0.0000000 0.7286085 0.2713915
# [5,] 0.0000000 0.0000000 1.0000000
# attr(,"degree")
# [1] 1
# attr(,"knots")
# 33.33333% 66.66667%
# 13.30000 38.83333
# attr(,"Boundary.knots")
# [1] 0.0 77.4
# attr(,"intercept")
# [1] FALSE
# attr(,"class")
# [1] "bs" "basis" "matrix"
好,所以degree
为1,正如我在输入中指定的那样. knots
告诉我两个内部结分别位于x = 13.3000和x = 38.8333.看到结是固定的分位数时,我感到有些惊讶,我希望R可以为我的数据找到最佳的分位数,但这当然会使模型不线性,而且在不知道响应数据的情况下也是不可能的. intercept = FALSE
表示基础中不包含任何截距(这是一件好事吗?我一直被教导不要在没有截距的情况下拟合线性模型……好吧,猜想lm
还是会加一个).
Ok, so degree
is 1, as I specified in input. knots
is telling me that the two internal knots are at x = 13.3000 and x = 38.8333 respectively. Was a bit surprised to see that the knots are at fixed quantiles, I hoped R would find the best quantiles for my data, but of course that would make the model not linear, and also wouldn't be possible without knowing the response data. intercept = FALSE
means that no intercept was included in the basis (is that a good thing? I've always being taught not to fit linear models without an intercept...well guess lm
is just adding one anyway).
但是,矩阵呢?我不太了解如何解释它.对于三列,我认为这意味着基函数为三个.这是有道理的:如果我有两个内部结点K1
和K2
,我将在左边界结点B1
和K1
之间有一个样条线,在K1
和K2
之间有另一个样条线,最后一个是在K2
和B2
之间,所以...三个基本函数,确定.但是,究竟是哪些基函数?例如,此列是什么意思?
However, what about the matrix? I don't really understand how to interpret it. With three columns, I would think it means that the basis functions are three. This makes sense: if I have two internal knots K1
and K2
, I will have a spline between left boundary knot B1
and K1
, another spline between K1
and K2
, and a final one between K2
and B2
, so...three basis functions, ok. But which are the basis functions exactly? For example, what does this column mean?
# 1
# [1,] 0.0000000
# [2,] 0.8270677
# [3,] 0.8198433
# [4,] 0.0000000
# [5,] 0.0000000
这与此问题相似但不完全相同.该问题询问回归系数的解释,但在此之前我要迈出一步:我想了解模型矩阵系数的含义.如果我尝试按照第一个答案中的建议绘制相同的图,我会得到一个混乱的图:
this is similar to but not precisely the same as this question. That question asks about the interpretation of the regression coefficients, but I'm a step before that: I would like to understand the meaning of the model matrix coefficients. If I try to make the same plots as suggested in the first answer, I get a messed up plot:
b <- bs(x, df = 3, degree = 1)
b1 <- b[, 1] ## basis 1
b2 <- b[, 2] ## basis 2
b3 <- b[,3]
par(mfrow = c(1, 3))
plot(x, b1, type = "l", main = "basis 1: b1")
plot(x, b2, type = "l", main = "basis 2: b2")
plot(x, b3, type = "l", main = "basis 3: b3")
这些不能成为B样条基函数,因为它们的结数太多(每个函数只能有一个结).
These can't be the B-spline basis functions, because they have too many knots (each function should only have one).
第二个答案实际上将允许我在R之外重建模型,所以我想可以做到这一点.但是,这个答案也不能完全解释b
矩阵的元素是什么:它处理线性回归的系数,我在这里还没有介绍.确实,这是我的最终目标,但我也想了解这一中间步骤.
The second answer would actually allow me to reconstruct my model outside R, so I guess I could go with that. However, also that answer doesn't exactly explains what the elements of the b
matrix are: it deals with the coefficients of a linear regression, which I haven't still introduced here. It's true that that is my final goal, but I wanted to understand also this intermediate step.
推荐答案
矩阵b
# 1 2 3
# [1,] 0.0000000 0.0000000 0.0000000
# [2,] 0.8270677 0.0000000 0.0000000
# [3,] 0.8198433 0.1801567 0.0000000
# [4,] 0.0000000 0.7286085 0.2713915
# [5,] 0.0000000 0.0000000 1.0000000
实际上只是x
每个点上三个基函数值的矩阵,这对我来说应该是显而易见的,因为它与多项式线性模型的解释完全相同.实际上,由于边界结是
is actually just the matrix of the values of the three basis functions in each point of x
, which should have been obvious to me since it's exactly the same interpretation as for a polynomial linear model. As a matter of fact, since the boundary knots are
bknots <- attr(b,"Boundary.knots")
# [1] 0.0 77.4
内部结是
iknots <- attr(b,"knots")
# 33.33333% 66.66667%
# 13.30000 38.83333
然后,此处所示的三个基本函数分别是:
then the three basis functions, as shown here, are:
knots <- c(bknots[1],iknots,bknots[2])
y1 <- c(0,1,0,0)
y2 <- c(0,0,1,0)
y3 <- c(0,0,0,1)
par(mfrow = c(1, 3))
plot(knots, y1, type = "l", main = "basis 1: b1")
plot(knots, y2, type = "l", main = "basis 2: b2")
plot(knots, b3, type = "l", main = "basis 3: b3")
现在,考虑b[,1]
# 1
# [1,] 0.0000000
# [2,] 0.8270677
# [3,] 0.8198433
# [4,] 0.0000000
# [5,] 0.0000000
这些必须是x <- c(0.0, 11.0, 17.9, 49.3, 77.4)
中b1
的值.事实上,b1
在knots[1] = 0
中为0,在knots[2] = 13.3000
中为1,这意味着在x[2]
(11.0)中,值必须为11/13.3 = 0.8270677
.同样,由于knots[3] = 38.83333
的b1
为0,因此x[3]
(17.9)中的值必须为(38.83333-13.3)/17.9 = 0.8198433
.由于x[4], x[5] > knots[3] = 38.83333
,因此b1
在此处为0.对于其他两列也可以给出类似的解释.
These must be the values of b1
in x <- c(0.0, 11.0, 17.9, 49.3, 77.4)
. As a matter of fact, b1
is 0 in knots[1] = 0
and 1 in knots[2] = 13.3000
, meaning that in x[2]
(11.0) the value must be 11/13.3 = 0.8270677
, as expected. Similarly, since b1
is 0 for knots[3] = 38.83333
, the value in x[3]
(17.9) must be (38.83333-13.3)/17.9 = 0.8198433
. Since x[4], x[5] > knots[3] = 38.83333
, b1
is 0 there. A similar interpretation can be given for the other two columns.
这篇关于R函数bs()的输出的解释(B样条基矩阵)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!