R中带有mgcv的粗薄板样条拟合(薄板样条插值) [英] Rough thin-plate spline fitting (thin-plate spline interpolation) in R with mgcv

查看:170
本文介绍了R中带有mgcv的粗薄板样条拟合(薄板样条插值)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景

我正在尝试在

Background

I am trying to replicate figure 2.6 in the book An Introduction to Statistical Learning:

一个粗糙的薄板样条曲线拟合图2.3中的收入数据.这 拟合使训练数据错误为零.

A rough thin-plate spline fit to the Income data from Figure 2.3. This fit makes zero errors on the training data.

到目前为止,我尝试了什么?

我试图复制先前的图2.5,它是平滑的薄板样条曲线拟合,不确定是否成功.

What have I tried so far?

I tried to replicate the previous figure 2.5, a smooth thin-plate spline fit, not sure if succesfully.

income_2 <- read.csv("http://www-bcf.usc.edu/~gareth/ISL/Income2.csv")
library(mgcv)
model1 <- gam(Income ~ te(Education, Seniority, bs=c("tp", "tp")), data = income_2) 
x <- range(income_2$Education)
x <- seq(x[1], x[2], length.out=30)
y <- range(income_2$Seniority)
y <- seq(y[1], y[2], length.out=30)
z <- outer(x,y,
           function(Education,Seniority)
                     predict(model1, data.frame(Education,Seniority)))
p <- persp(x,y,z, theta=30, phi=30,
           col="yellow",expand = 0.5,shade = 0.2,
           xlab="Education", ylab="Seniority", zlab="Income")
obs <- trans3d(income_2$Education, income_2$Seniority,income_2$Income,p)
pred <- trans3d(income_2$Education, income_2$Seniority,fitted(model1),p)
points(obs, col="red",pch=16)
segments(obs$x, obs$y, pred$x, pred$y)

双重问题

  1. 我是否使用,文档中说:它们是薄板样条线的降阶版本,并使用薄板样条线罚分."
  2. 如何创建粗糙的薄板样条曲线拟合,使训练数据误差为零? (上面的第一个数字)
  1. Did I create a proper smooth thin-plate with gam? I am using as smooth.terms bs="tp", and the documentation says 'They are reduced rank versions of the thin plate splines and use the thin plate spline penalty.'
  2. How can I can create a rough thin-plate spline fit making zero errors on the training data? (First figure above)

推荐答案

income_2 <- structure(list(Education = c(21.5862068965517, 18.2758620689655, 
12.0689655172414, 17.0344827586207, 19.9310344827586, 18.2758620689655, 
19.9310344827586, 21.1724137931034, 20.3448275862069, 10, 13.7241379310345, 
18.6896551724138, 11.6551724137931, 16.6206896551724, 10, 20.3448275862069, 
14.1379310344828, 16.6206896551724, 16.6206896551724, 20.3448275862069, 
18.2758620689655, 14.551724137931, 17.448275862069, 10.4137931034483, 
21.5862068965517, 11.2413793103448, 19.9310344827586, 11.6551724137931, 
12.0689655172414, 17.0344827586207), Seniority = c(113.103448275862, 
119.310344827586, 100.689655172414, 187.586206896552, 20, 26.2068965517241, 
150.344827586207, 82.0689655172414, 88.2758620689655, 113.103448275862, 
51.0344827586207, 144.137931034483, 20, 94.4827586206897, 187.586206896552, 
94.4827586206897, 20, 44.8275862068966, 175.172413793103, 187.586206896552, 
100.689655172414, 137.931034482759, 94.4827586206897, 32.4137931034483, 
20, 44.8275862068966, 168.965517241379, 57.2413793103448, 32.4137931034483, 
106.896551724138), Income = c(99.9171726114381, 92.579134855529, 
34.6787271520874, 78.7028062353695, 68.0099216471551, 71.5044853814318, 
87.9704669939115, 79.8110298331255, 90.00632710858, 45.6555294997364, 
31.9138079371295, 96.2829968022869, 27.9825049000603, 66.601792415137, 
41.5319924201478, 89.00070081522, 28.8163007592387, 57.6816942573605, 
70.1050960424457, 98.8340115435447, 74.7046991976891, 53.5321056283034, 
72.0789236655191, 18.5706650327685, 78.8057842852386, 21.388561306174, 
90.8140351180409, 22.6361626208955, 17.613593041445, 74.6109601985289
)), .Names = c("Education", "Seniority", "Income"), row.names = c(NA, 
-30L), class = "data.frame")

library(mgcv)

首先,您可以将s(Education, Seniority, bs = 'tp')用于二元薄板样条曲线,而不是使用张量积构造.薄板样条线在任何尺寸上都是明确定义的.

First of all, you can just use s(Education, Seniority, bs = 'tp') for a bivariate thin-plate spline rather than using tensor product construction. A thin-plate spline is well-defined on any dimension.

第二,mgcv不会进行回归而不是插值,因此,如果不进行调整,就无法使拟合的样条曲线遍历所有点.薄板样条的调整包括:

Secondly, mgcv does regression not interpolation, so without a tweak you can't have the fitted spline running through all points. The tweak for a thin-plate spline includes:

  1. 通过将k设置为精确的唯一采样点数(如果您有2000个以上的唯一数据位置,则也设置为xt)来禁用bs = 'tp'后面的低秩逼近;
  2. 设置sp = 0以禁用样条线的惩罚.
  1. disable low-rank approximation behind bs = 'tp', by setting k to exactly the number of unique sampling points (and xt as well if you have more than 2000 unique data locations);
  2. set sp = 0 to disable penalization of the spline.

数据集income_2中唯一采样位置的数量为

The number of unique sampling locations in your dataset income_2 is

xt <- unique(income_2[c("Education", "Seniority")]) 
nrow(xt)
#[1] 30

因为30小于2000,所以我们可以设置k = 30,而无需将xt传递给s(, bs = 'tp').

Because 30 is smaller than 2000, we can just set k = 30 and don't need to pass xt to s(, bs = 'tp').

interpolation_model <- gam(Income ~ s(Education, Seniority, k = 30, sp = 0),
                           data = income_2)
interpolation_model$residuals
# [1]  2.131628e-13  2.728484e-12  4.561684e-12  1.264766e-12  3.495870e-12
# [6]  4.177991e-12 -1.023182e-12  1.193712e-12  2.231104e-12  6.878054e-12
#[11]  6.309619e-12  6.679102e-13  7.574386e-12  3.637979e-12  4.227729e-12
#[16]  1.790568e-12  4.376943e-12  5.130119e-12  8.242296e-13 -6.536993e-13
#[21]  2.771117e-12  1.811884e-12  3.495870e-12  9.141132e-12  2.117417e-12
#[26]  7.243983e-12 -3.979039e-13  6.352252e-12  6.203038e-12  3.652190e-12

现在您看到所有残差均为零.

Now you see that all residuals are zero.

您还可以查找直接执行薄板样条插值的其他程序包.

You can also look for other packages that do thin-plate spline interpolation directly.

薄板样条线是各向同性/径向的,如果变量的比例不同,请小心!

谢谢您的解释并解决了我的问题.你知道为什么花键表面看起来起伏更大而脊更少吗?

Thank you for the explanation and solving my question. Do you know why the spline surface looks more undulating with less ridges?

因为两个变量的比例差异很大.您要先对两个变量进行标准化,然后再拟合薄板样条线.

Because your two variables differ in scale greatly. You want to standardize your two variables first, then fit a thin plate spline.

## this is how your original data look like on the 2D domain
with(income_2, plot(Education, Seniority, asp = 1))

## let's scale it
xt_scaled <- scale(xt)
dat <- data.frame(xt_scaled, Income = income_2$Income)

with(dat, plot(Education, Seniority, asp = 1))

## fit a model on scaled data
interpolation_model <- gam(Income ~ s(Education, Seniority, k = 30, sp = 0),
                           data = dat)

## grid on the transformed space
x <- range(dat$Education)
x <- seq(x[1], x[2], length.out=30)
y <- range(dat$Seniority)
y <- seq(y[1], y[2], length.out=30)

## prediction on the transformed space
newdat <- expand.grid(Education = x, Seniority = y)
z <- matrix(predict(interpolation_model, newdat), nrow = length(x))

现在要生成图,我们想将网格反向转换为其原始比例.请注意,这不需要转换预测值.

Now to produce plot, we want to back transform the grid to its original scale. Note there this is no need to transform the predicted values.

## back transform the grid
scaled_center <- attr(xt_scaled, "scaled:center")
#Education Seniority 
# 16.38621  93.86207 
scaled_scale <- attr(xt_scaled, "scaled:scale")
#Education Seniority 
# 3.810622 55.715623 
xx <- x * scaled_scale[1] + scaled_center[1]
yy <- y * scaled_scale[2] + scaled_center[2]

## use `xx`, `yy` and `z`
p <- persp(xx, yy, z, theta = 30, phi = 30,
           col = "yellow",expand = 0.5, shade = 0.2,
           xlab = "Education", ylab = "Seniority", zlab = "Income")
obs <- trans3d(income_2$Education, income_2$Seniority, income_2$Income, p)
pred <- trans3d(income_2$Education, income_2$Seniority, fitted(interpolation_model), p)
points(obs, col="red",pch=16)
segments(obs$x, obs$y, pred$x, pred$y)

这篇关于R中带有mgcv的粗薄板样条拟合(薄板样条插值)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆