循环中的predict.lm().警告:等级不足拟合的预测可能会产生误导 [英] predict.lm() in a loop. warning: prediction from a rank-deficient fit may be misleading

查看:695
本文介绍了循环中的predict.lm().警告:等级不足拟合的预测可能会产生误导的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此R代码引发警告

# Fit regression model to each cluster
y <- list() 
length(y) <- k
vars <- list() 
length(vars) <- k
f <- list()
length(f) <- k

for (i in 1:k) {
  vars[[i]] <- names(corc[[i]][corc[[i]]!= "1"])
  f[[i]]  <- as.formula(paste("Death ~", paste(vars[[i]], collapse= "+")))
  y[[i]]  <- lm(f[[i]], data=C1[[i]]) #training set
  C1[[i]] <- cbind(C1[[i]], fitted(y[[i]]))
  C2[[i]] <- cbind(C2[[i]], predict(y[[i]], C2[[i]])) #test set
}

我有一个训练数据集(C1)和一个测试数据集(C2).每个变量都有129个变量.我在C1上进行了k均值聚类分析,然后根据聚类成员资格拆分了我的数据集,并创建了不同聚类的列表(C1 [[1]],C1 [[2]],...,C1 [[k] ]).我还为C2中的每种情况分配了集群成员资格,并创建了C2 [[1]],...,C2 [[k]].然后,将线性回归拟合到C1中的每个聚类.我的因变量是死亡".我的预测变量在每个聚类中都不相同,并且vars [[i]](i = 1,...,k)显示预测变量名称的列表.我想预测测试数据集中(C2 [[1]],...,C2 [[k])中每种情况的死亡.当我运行以下代码时,对于某些集群.

I have a training data set (C1) and a test data set (C2). Each one has 129 variables. I did k means cluster analysis on the C1 and then split my data set based on cluster membership and created a list of different clusters (C1[[1]], C1[[2]], ..., C1[[k]]). I also assigned a cluster membership to each case in C2 and created C2[[1]],..., C2[[k]]. Then I fit a linear regression to each cluster in C1. My dependant variable is "Death". My predictors are different in each cluster and vars[[i]] (i=1,...,k) shows a list of predictors' name. I want to predict Death for each case in test data set (C2[[1]],..., C2[[k]). When I run the following code, for some of the clusters.

我收到此警告:

In predict.lm(y[[i]], C2[[i]]) :
prediction from a rank-deficient fit may be misleading

关于此警告,我读了很多书,但我不知道是什么问题.

I read a lot about this warning but I couldn't figure out what the issue is.

推荐答案

您可以使用body(predict.lm)检查预测函数.在那里,您将看到以下行:

You can inspect the predict function with body(predict.lm). There you will see this line:

if (p < ncol(X) && !(missing(newdata) || is.null(newdata))) 
    warning("prediction from a rank-deficient fit may be misleading")

此警告检查数据矩阵的等级是否至少等于您要适合的参数数量.调用它的一种方法是具有一些共线性协变量:

This warning checks if the rank of your data matrix is at least equal to the number of parameters you want to fit. One way to invoke it is having some collinear covariates:

data <- data.frame(y=c(1,2,3,4), x1=c(1,1,2,3), x2=c(3,4,5,2), x3=c(4,2,6,0), x4=c(2,1,3,0))
data2 <- data.frame(x1=c(3,2,1,3), x2=c(3,2,1,4), x3=c(3,4,5,1), x4=c(0,0,2,3))
fit <- lm(y ~ ., data=data)

predict(fit, data2)
       1        2        3        4 
4.076087 2.826087 1.576087 4.065217 
Warning message:
In predict.lm(fit, data2) :
  prediction from a rank-deficient fit may be misleading

请注意,x3和x4在data中具有相同的方向.一个是另一个的倍数.可以使用length(fit$coefficients) > fit$rank

Notice that x3 and x4 have the same direction in data. One is the multiple of the other. This can be checked with length(fit$coefficients) > fit$rank

另一种方法是拥有比可用变量更多的参数:

Another way is having more parameters than available variables:

fit2 <- lm(y ~ x1*x2*x3*x4, data=data)
predict(fit2, data2)
Warning message:
In predict.lm(fit2, data2) :
  prediction from a rank-deficient fit may be misleading

这篇关于循环中的predict.lm().警告:等级不足拟合的预测可能会产生误导的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆