R中的矩阵乘法:需要数字/复杂矩阵/矢量参数 [英] Matrix multiplication in R: requires numeric/complex matrix/vector arguments
问题描述
我正在使用mlbench
包中的数据集BreastCancer
,并且我尝试做以下矩阵乘法作为逻辑回归的一部分.
I'm using the dataset BreastCancer
in the mlbench
package, and I am trying to do the following matrix multiplication as a part of logistic regression.
我在前10列中获得了这些功能,并创建了一个称为theta的参数向量:
I got the features in the first 10 columns, and create a vector of parameters called theta:
X <- BreastCancer[, 1:10]
theta <- data.frame(rep(1, 10))
然后我做了以下矩阵乘法:
Then I did the following matrix multiplication:
constant <- as.matrix(X) %*% as.vector(theta[, 1])
但是,出现以下错误:
Error in as.matrix(X) %*% as.vector(theta[, 1]) :
requires numeric/complex matrix/vector arguments
我是否需要先使用as.numeric(X)
将矩阵强制转换为两倍? X
中的值看起来像字符串,因为它们带有双引号.
Do I need to cast the matrix to double using as.numeric(X)
first? Values in X
look like strings as they have double quotes.
推荐答案
在评论中对讨论进行长期讨论.
Organizing our long-winded discussion in comments to an answer.
矩阵乘法运算符/函数,例如"%*%",
crossprod ,
tcrossprod`,期望矩阵具有数字",复杂"或逻辑"模式.但是,您的矩阵具有字符"模式.
Matrix-multiplication operators / functions like "%*%",
crossprod,
tcrossprod` expects matrices with "numeric", "complex" or "logical" mode. However, your matrix has "character" mode.
library(mlbench)
data(BreastCancer)
X <- as.matrix(BreastCancer[, 1:10])
mode(X)
#[1] "character"
由于数据集似乎包含数字数据,您可能会感到惊讶:
You might be surprised as the dataset seems to hold numeric data:
head(BreastCancer[, 1:10])
# Id Cl.thickness Cell.size Cell.shape Marg.adhesion Epith.c.size
#1 1000025 5 1 1 1 2
#2 1002945 5 4 4 5 7
#3 1015425 3 1 1 1 2
#4 1016277 6 8 8 1 3
#5 1017023 4 1 1 3 2
#6 1017122 8 10 10 8 7
# Bare.nuclei Bl.cromatin Normal.nucleoli Mitoses
#1 1 3 1 1
#2 10 3 2 1
#3 2 3 1 1
#4 4 3 7 1
#5 1 3 1 1
#6 10 9 7 1
但是您对打印样式不了解. 这些列实际上是字符或因素:
But you are misinformed by the printing style. These columns are in fact characters or factors:
lapply(BreastCancer[, 1:10], class)
#$Id
#[1] "character"
#
#$Cl.thickness
#[1] "ordered" "factor"
#
#$Cell.size
#[1] "ordered" "factor"
#
#$Cell.shape
#[1] "ordered" "factor"
#
#$Marg.adhesion
#[1] "ordered" "factor"
#
#$Epith.c.size
#[1] "ordered" "factor"
#
#$Bare.nuclei
#[1] "factor"
#
#$Bl.cromatin
#[1] "factor"
#
#$Normal.nucleoli
#[1] "factor"
#
#$Mitoses
#[1] "factor"
当您执行as.matrix
时,这些列都被强制转换为字符"(请参阅 R:为什么我没有得到输入或列转换为因数后的因数"类?进行详细说明).
When you do as.matrix
, these columns are all coerced to "character" (see R: Why am I not getting type or class "factor" after converting columns to factor? for a thorough explanation).
因此,要进行矩阵乘法,我们需要将这些列正确地强制为数字".
So to do the matrix-multiplication, we need to correctly coerce these columns to "numeric".
dat <- BreastCancer[, 1:10]
## character to numeric
dat[[1]] <- as.numeric(dat[[1]])
## factor to numeric
dat[2:10] <- lapply( dat[2:10], function (x) as.numeric(levels(x))[x] )
## get the matrix
X <- data.matrix(dat)
mode(X)
#[1] "numeric"
现在您可以执行矩阵向量乘法.
Now you can do for example, a matrix-vector multiplication.
## some possible matrix-vector multiplications
beta <- runif(10)
yhat <- X %*% beta
## add prediction back to data frame
dat$prediction <- yhat
但是,我怀疑这是获取逻辑回归模型预测值的正确方法,因为当您使用因子构建模型时,模型矩阵不是上面的X
而是虚拟矩阵.我强烈建议您使用predict
.
However, I doubt this is the correct way to obtain predicted values for you logistic regression model as when you build your model with factors, the model matrix is not the above X
but a dummy matrix. I highly recommend you using predict
.
此行也对我有用:
as.matrix(sapply(dat, as.numeric))
看起来您很幸运.该数据集恰好具有与数值相同的因子水平.通常,将因子转换为数值应该使用我做的方法.比较
Looks like you were lucky. The dataset happens to have factor levels as same as numeric values. In general, converting a factor to numeric should use the method I did. Compare
f <- gl(4, 2, labels = c(12.3, 0.5, 2.9, -11.1))
#[1] 12.3 12.3 0.5 0.5 2.9 2.9 -11.1 -11.1
#Levels: 12.3 0.5 2.9 -11.1
as.numeric(f)
#[1] 1 1 2 2 3 3 4 4
as.numeric(levels(f))[f]
#[1] 12.3 12.3 0.5 0.5 2.9 2.9 -11.1 -11.1
这在文档页面?factor
中进行了介绍.
This is covered at the doc page ?factor
.
这篇关于R中的矩阵乘法:需要数字/复杂矩阵/矢量参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!