R中带有“pls"的PLSR包裹 [英] PLSR in R with "pls" package

查看:109
本文介绍了R中带有“pls"的PLSR包裹的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试拟合 PLSR 模型,但我做错了.下面,您可以看到我是如何创建数据框及其结构的.

reflektance <- read_excel("data/reflektance.xlsx", na = "NA")reflektance <- dput(reflektance)pH <- read_excel("data/rijen2016.xls", na = "NA")pH<-na.omit(pH)pH<-dput(pH)reflektance<-aggregate(reflektance[, 2:753], list(reflektance$Vzorek), mean)colnames(reflektance)[colnames(reflektance)=='Group.1']<-'Vzorek'数据pH <-合并(pH,反射,by="Vzorek")数据集pH <- data.frame(pH=datapH[,2], ref=I(as.matrix(datapH[, 3:754], 22, 752)))

问题在于使用plsr",因为结果是这个错误:

ph1<-plsr(pH ~ ref, ncomp = 5, data=datasetpH)pls::mvr(ref ~ pH, ncomp = 5, data = datasetpH, method = "kernelpls") 中的错误:组件数无效,ncomp

dput(反射):https://jpst.it/RyyS

在这里你可以看到表数据pH的结构:

'data.frame': 22 obs.754 个变量:$ Vzorek: chr "5 - P01" "5 - P02" "5 - P03" "5 - R1 - A1" ...$ pH/H2O:数量 6.96 6.62 7.02 5.62 5.97 6.12 5.64 5.81 5.61 5.47 ...325 美元:数量 0.017 0.0266 0.0191 0.0241 0.016 ...326 美元:数量 0.021 0.0263 0.0154 0.0264 0.0179 ...327 美元:数量 0.0223 0.0238 0.0147 0.028 0.0198 ......

这里是表数据集pH的结构:

'data.frame': 22 obs.2个变量:$ pH 值:数 6.96 6.62 7.02 5.62 5.97 6.12 5.64 5.81 5.61 5.47 ...$ ref: AsIs [1:22, 1:752] 0.016983.... 0.026556.... 0.019059.... 0.024097.... 0.016000.... .....- attr(*, "dimnames")=2 的列表.. ..$ : NULL.. ..$ : chr "325" "326" "327" "328" ...

您有什么建议和解决方案吗?谢谢

解决方案

问题似乎来自您的一列仅包含 NA 的列.
names(df) 输出的最后一行给出:

[745] "1068" "1069" "1070" "1071" "1072" "1073" "1074" "1075" 不适用

使用您的数据 + 一些随机生成的 pH 值(不在 reflektance 数据框中,此处命名为 df):

test=data.frame(pH=rnorm(23,5,2), ref=I(as.matrix(df[, 2:752], 22, 751)))pls::plsr(pH ~ ref, 数据=测试)

<块引用>

矩阵错误(0,ncol = ncomp,nrow = npred):无效的ncol"值 (< 0)

请注意,索引编制与您的有所不同.我没有 df 中的第二列(包含您的 pH 值的那一列).
如果我删除包含 NA 的最后一列:

test=data.frame(pH=rnorm(23,5,2), ref=I(as.matrix(df[, 2:752], 22, 751)))pls::plsr(pH ~ ref, 数据=测试)偏最小二乘回归,采用核算法.称呼:plsr(公式 = pH ~ 参考,数据 = 测试)

如果这能解决问题,请告诉我.

I'm trying to fit PLSR model, but I'm doing something wrong. Below, you can see how I created data frame and its structure.

reflektance <- read_excel("data/reflektance.xlsx", na = "NA")
reflektance <- dput(reflektance)
pH <- read_excel("data/rijen2016.xls", na = "NA")
pH <- na.omit(pH)
pH <- dput(pH)

reflektance<-aggregate(reflektance[, 2:753], list(reflektance$Vzorek), mean)
colnames(reflektance)[colnames(reflektance)=='Group.1']<-'Vzorek'
datapH <- merge(pH, reflektance, by="Vzorek")
datasetpH <- data.frame(pH=datapH[,2], ref=I(as.matrix(datapH[, 3:754], 22, 752)))

Problem is with using "plsr", because result is this error:

ph1<-plsr(pH ~ ref, ncomp = 5, data=datasetpH)
Error in pls::mvr(ref ~ pH, ncomp = 5, data = datasetpH, method = "kernelpls") : 
Invalid number of components, ncomp

dput(reflectance): https://jpst.it/RyyS

Here you can see structure of table datapH:

'data.frame':   22 obs. of  754 variables:
 $ Vzorek: chr  "5 - P01" "5 - P02" "5 - P03" "5 - R1 - A1" ...
 $ pH/H2O: num  6.96 6.62 7.02 5.62 5.97 6.12 5.64 5.81 5.61 5.47 ...
 $ 325   : num  0.017 0.0266 0.0191 0.0241 0.016 ...
 $ 326   : num  0.021 0.0263 0.0154 0.0264 0.0179 ...
 $ 327   : num  0.0223 0.0238 0.0147 0.028 0.0198 ...
 ...

And here structure of table datasetpH:

'data.frame':   22 obs. of  2 variables:
 $ pH : num  6.96 6.62 7.02 5.62 5.97 6.12 5.64 5.81 5.61 5.47 ...
 $ ref: AsIs [1:22, 1:752] 0.016983.... 0.026556.... 0.019059.... 0.024097.... 0.016000.... ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr  "325" "326" "327" "328" ...

Do you have any advice and solution? Thank you

解决方案

The problem seems to come from one of your columns containing only NA's.
The last line of the output of names(df)gives:

[745] "1068"   "1069"   "1070"   "1071"   "1072"   "1073"   "1074"   "1075"   NA  

Using your data + some randomly generated values for pH (which isn't in the reflektance dataframe, named df here):

test=data.frame(pH=rnorm(23,5,2), ref=I(as.matrix(df[, 2:752], 22, 751)))
pls::plsr(pH ~ ref, data=test)

Error in matrix(0, ncol = ncomp, nrow = npred) : invalid 'ncol' value (< 0)

Note that the indexing is a bit different from yours. I didn't have the second column in df (the one that contains pH in yours).
If I remove the last column which contains NA's :

test=data.frame(pH=rnorm(23,5,2), ref=I(as.matrix(df[, 2:752], 22, 751)))
pls::plsr(pH ~ ref, data=test)
Partial least squares regression , fitted with the kernel algorithm.
Call:
plsr(formula = pH ~ ref, data = test)

Let me know if that fixes it.

这篇关于R中带有“pls"的PLSR包裹的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆