2组之间R中的马哈拉诺比斯距离 [英] mahalanobis distance in R between 2 goups
问题描述
我有两个小组,每个小组有3个变量,如下所示:
I have two groups, that each group has 3 variables such as following:
Group1:
cost time quality
[1,] 90 4 70
[2,] 4 27 37
[3,] 82 4 17
[4,] 18 41 4
Group2:
cost time quality
[1,] 4 27 4
用于计算两个之间的马哈拉诺比斯距离的代码组如下:
codes to calculate mahalanobis distance between two groups are as following:
benchmark<-rbind(c(90,4,70),c(4,27,37),c(82,4,17),c(18,41,4))
colnames(benchmark)=c('cost','time','quality')
current=rbind(c(4,27,4))
colnames(current)=c('cost','time','quality')
bdm<-as.matrix(benchmark)
cdm<-as.matrix(current)
mat1<-matrix(bdm,ncol=ncol(bdm),dimnames=NULL)
mat2<-matrix(cdm,ncol=ncol(cdm),dimnames=NULL)
#center Data
mat1.1<-scale(mat1,center = T,scale = F)
mat2.1<-scale(mat2,center=T,scale=F)
#cov Matrix
mat1.2<-cov(mat1.1,method="pearson")
mat2.2<-cov(mat2.1,method="pearson")
#the pooled covariance is calculated using weighted average
n1<-nrow(mat1)
n2<-nrow(mat2)
n3<-n1+n2
#pooled matrix
#pooled matrix
mat3<-((n1/n3)*mat1.2) + ((n2/n3)*mat2.2)
mat4<-solve(mat3)
#Mean diff
mat5<-as.matrix((colMeans(mat1)-colMeans(mat2)))
#multiply
mat6<-t(mat5)%*%mat4
#Mahalanobis distance
sqrt(mat6 %*% mat5)
The结果为NA,但是当我在以下链接中输入值时计算mahalanobis dis tance 来计算显示在组1和组2之间的马氏距离的马哈拉诺比斯距离= 2.4642
The Result is NA but when I entered the values in the following link calculate mahalanobis distance to calculate the mahalanobis distance it shows Mahalanobis Distance between group1 and group2 = 2.4642
此外,我得到的错误消息是:
Moreover the error message that I got is :
Error in ((n1/n3) * mat1.2) + ((n2/n3) * mat2.2) : non-conformable arrays
和警告消息:
In colMeans(mat1) - colMeans(mat2) :
longer object length is not a multiple of shorter object length
推荐答案
我觉得您要执行的操作必须存在于某些 R
包中。经过彻底的搜索之后,我发现软件包 asbio
中的函数 D.sq
看起来非常接近。此函数需要2个矩阵作为输入,因此不适用于您的示例。我还提供了接受第二个矩阵矢量的修改版本。
I felt like what you are trying to do must exist in some R
package. After a pretty thorough search, I found function D.sq
in package asbio
which looks very close. This function requires 2 matrices as input, so it doesn't work for your example. I also include a modified version that accepts a vector for the 2nd matrix.
# Original Function
D.sq <- function (g1, g2) {
dbar <- as.vector(colMeans(g1) - colMeans(g2))
S1 <- cov(g1)
S2 <- cov(g2)
n1 <- nrow(g1)
n2 <- nrow(g2)
V <- as.matrix((1/(n1 + n2 - 2)) * (((n1 - 1) * S1) + ((n2 -
1) * S2)))
D.sq <- t(dbar) %*% solve(V) %*% dbar
res <- list()
res$D.sq <- D.sq
res$V <- V
res
}
# Data
g1 <- matrix(c(90, 4, 70, 4, 27, 37, 82, 4, 17, 18, 41, 4), ncol = 3, byrow = TRUE)
g2 <- c(2, 27, 4)
# Function modified to accept a vector for g2 rather than a matrix
D.sq2 <- function (g1, g2) {
dbar <- as.vector(colMeans(g1) - g2)
S1 <- cov(g1)
S2 <- var(g2)
n1 <- nrow(g1)
n2 <- length(g2)
V <- as.matrix((1/(n1 + n2 - 2)) * (((n1 - 1) * S1) + ((n2 -
1) * S2)))
D.sq <- t(dbar) %*% solve(V) %*% dbar
res <- list()
res$D.sq <- D.sq
res$V <- V
res
}
但是,这并不能完全您期望得到的答案: D.sq2(g1,g2)$ D.sq
返回2.2469。
However, this doesn't quite give the answer you expect: D.sq2(g1,g2)$D.sq
returns 2.2469.
也许您可以将您原始的 matlab
方法与这些详细信息进行比较,找出差异的来源。快速浏览表明,不同之处在于 V
中的分母是如何计算的。对我来说,这也很可能是一个错误,但是希望这可以帮助您。
Perhaps you can compare your original matlab
method with these details and figure out the source of the difference. A quick look suggests the difference is how the denominator in V
is computed. It may well also be an error on my part, but hopefully this gets you going.
这篇关于2组之间R中的马哈拉诺比斯距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!