线性模型奇异是因为R中的大日期时间 [英] Linear model singular because of large integer datetime in R?

查看:88
本文介绍了线性模型奇异是因为R中的大日期时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对随机正态日期的简单回归失败,但是具有较小整数而不是日期的相同数据可以按预期工作.

Simple regression of random normal on date fails, but identical data with small integers instead of dates works as expected.

# Example dataset with 100 observations at 2 second intervals.
set.seed(1)
df <- data.frame(x=as.POSIXct("2017-03-14 09:00:00") + seq(0, 199, 2),
                 y=rnorm(100))

#> head(df)
#                     x          y
# 1 2017-03-14 09:00:00 -0.6264538
# 2 2017-03-14 09:00:02  0.1836433
# 3 2017-03-14 09:00:04 -0.8356286

# Simple regression model.
m <- lm(y ~ x, data=df)

由于数据中的奇异性,因此缺少斜率.调用摘要说明了这一点:

The slope is missing due to singularities in the data. Calling the summary demonstrates this:

summary(m)

# Coefficients: (1 not defined because of singularities)
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)  0.10889    0.08982   1.212    0.228
# x                 NA         NA      NA       NA

这可能是由于POSIXct类引起的吗?

Could this be because of the POSIXct class?

# Convert date variable to integer.
df$x2 <- as.integer(df$x)
lm(y ~ x2, data=df)

# Coefficients:
# (Intercept)           x2  
#      0.1089           NA

不是,x2的系数仍然缺失.

Nope, coefficient for x2 still missing.

如果我们将x2的基线设为零会怎样?

What if we make the baseline of x2 zero?

# Subtract minimum of x.
df$x3 <- df$x2 - min(df$x2)
lm(y ~ x3, data=df)

# Coefficients:
# (Intercept)           x3  
#   0.1312147   -0.0002255

这行得通!

还有一个示例可以排除这是由于datetime变量引起的.

One more example to rule out that this is due to datetime variable.

# Subtract large constant from date (data is now from 1985).
df$x4 <- df$x - 1000000000
lm(y ~ x4, data=df)

# Coefficients:
# (Intercept)           x4  
#   1.104e+05   -2.255e-04

没想到(为什么具有30年差异的相同数据集会导致不同的行为?),但这也行得通.

Not expected (why would an identical dataset with 30 years difference cause different behaviour?), but this works too.

可能是.Machine$integer.max(在我的PC上为2147483647)与它有关,但我无法弄清楚.如果有人可以解释这里发生的事情,将不胜感激.

Could be that .Machine$integer.max (2147483647 on my PC) has something to do with it, but I can't figure it out. It would be greatly appreciated if someone could explain what's going on here.

推荐答案

是的,可以. QR分解是稳定的,但不是全能的上帝.

Yes, it could. QR factorization is stable, but is not almighty God.

X <- cbind(1, 1e+11 + 1:10000)
qr(X)$rank
# 1

X就像线性回归模型的模型矩阵一样,其中有一个全1列用于截距,并且有一个日期时间序列(请注意较大的偏移量).

Here the X is like the model matrix for your linear regression model, where there is a all-1 column for intercept, and there is a sequence for datetime (note the large offset).

如果将datetime列居中,则这两列将是正交,因此非常稳定(即使直接求解法线方程!).

If you center the datetime column, these two columns will be orthogonal hence very stable (even when solving normal equation directly!).

这篇关于线性模型奇异是因为R中的大日期时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆