循环中的 predict.lm().警告:来自秩亏拟合的预测可能会产生误导 [英] predict.lm() in a loop. warning: prediction from a rank-deficient fit may be misleading

查看:70
本文介绍了循环中的 predict.lm().警告:来自秩亏拟合的预测可能会产生误导的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此 R 代码引发警告

# Fit regression model to each cluster
y <- list() 
length(y) <- k
vars <- list() 
length(vars) <- k
f <- list()
length(f) <- k

for (i in 1:k) {
  vars[[i]] <- names(corc[[i]][corc[[i]]!= "1"])
  f[[i]]  <- as.formula(paste("Death ~", paste(vars[[i]], collapse= "+")))
  y[[i]]  <- lm(f[[i]], data=C1[[i]]) #training set
  C1[[i]] <- cbind(C1[[i]], fitted(y[[i]]))
  C2[[i]] <- cbind(C2[[i]], predict(y[[i]], C2[[i]])) #test set
}

我有一个训练数据集 (C1) 和一个测试数据集 (C2).每个有 129 个变量.我在 C1 上做了 k 表示聚类分析,然后根据聚类成员资格拆分我的数据集并创建了一个不同聚类的列表 (C1[[1]], C1[[2]], ..., C1[[k]]).我还在 C2 中为每个案例分配了一个集群成员并创建了 C2[[1]],..., C2[[k]].然后我对 C1 中的每个集群进行线性回归.我的因变量是死亡".我的预测变量在每个集群中都不同,并且 vars[[i]] (i=1,...,k) 显示了一个预测变量名称列表.我想为测试数据集中的每个案例预测死亡 (C2[[1]],..., C2[[k]).当我运行以下代码时,对于某些集群.

I have a training data set (C1) and a test data set (C2). Each one has 129 variables. I did k means cluster analysis on the C1 and then split my data set based on cluster membership and created a list of different clusters (C1[[1]], C1[[2]], ..., C1[[k]]). I also assigned a cluster membership to each case in C2 and created C2[[1]],..., C2[[k]]. Then I fit a linear regression to each cluster in C1. My dependant variable is "Death". My predictors are different in each cluster and vars[[i]] (i=1,...,k) shows a list of predictors' name. I want to predict Death for each case in test data set (C2[[1]],..., C2[[k]). When I run the following code, for some of the clusters.

我收到此警告:

In predict.lm(y[[i]], C2[[i]]) :
prediction from a rank-deficient fit may be misleading

我阅读了很多有关此警告的信息,但我无法弄清楚问题出在哪里.

I read a lot about this warning but I couldn't figure out what the issue is.

推荐答案

您可以使用 body(predict.lm) 检查预测函数.在那里你会看到这一行:

You can inspect the predict function with body(predict.lm). There you will see this line:

if (p < ncol(X) && !(missing(newdata) || is.null(newdata))) 
    warning("prediction from a rank-deficient fit may be misleading")

此警告检查您的数据矩阵的等级是否至少等于您想要拟合的参数数量.调用它的一种方法是使用一些共线协变量:

This warning checks if the rank of your data matrix is at least equal to the number of parameters you want to fit. One way to invoke it is having some collinear covariates:

data <- data.frame(y=c(1,2,3,4), x1=c(1,1,2,3), x2=c(3,4,5,2), x3=c(4,2,6,0), x4=c(2,1,3,0))
data2 <- data.frame(x1=c(3,2,1,3), x2=c(3,2,1,4), x3=c(3,4,5,1), x4=c(0,0,2,3))
fit <- lm(y ~ ., data=data)

predict(fit, data2)
       1        2        3        4 
4.076087 2.826087 1.576087 4.065217 
Warning message:
In predict.lm(fit, data2) :
  prediction from a rank-deficient fit may be misleading

请注意,x3 和 x4 在 data 中具有相同的方向.一个是另一个的倍数.这可以通过 length(fit$coefficients) > 来检查.fit$rank

Notice that x3 and x4 have the same direction in data. One is the multiple of the other. This can be checked with length(fit$coefficients) > fit$rank

另一种方法是使用比可用变量更多的参数:

Another way is having more parameters than available variables:

fit2 <- lm(y ~ x1*x2*x3*x4, data=data)
predict(fit2, data2)
Warning message:
In predict.lm(fit2, data2) :
  prediction from a rank-deficient fit may be misleading

这篇关于循环中的 predict.lm().警告:来自秩亏拟合的预测可能会产生误导的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆