使用data.table按组计算mahalanobis距离 [英] Computing mahalanobis distance by group using data.table

查看:256
本文介绍了使用data.table按组计算mahalanobis距离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个下面的示例数据(d1和d2),我试图通过变量carb计算 mahalanobis.distance ,然后附加到d1。

I have a following sample data (d1 and d2) and am trying to compute the mahalanobis.distance by a variable carb and then append to the d1.

library(data.table)
library(StatMatch) #mahalanobis.distance

df<-as.data.table(mtcars)[carb %in% c(2,4), .(mpg, carb, vs)] # two groups of carb
d1<-df[vs==0,.(mpg,carb)]
d2<-df[vs==1,.(mpg,carb)]

#for carb==2, 

md2<-mahalanobis.dist(d1[carb==2,mpg],d2[carb==2,mpg])

             1        2        3         4         5
1 1.0416378 1.626417 1.681240 0.9502661 0.2923896
2 0.7492482 1.334027 1.388850 0.6578765 0.5847791
3 2.1380986 2.722878 2.777701 2.0467269 0.8040713
4 2.1380986 2.722878 2.777701 2.0467269 0.8040713
5 0.4934074 1.078186 1.133010 0.4020356 0.8406200

矩阵md2的维度:row是df1的行,column是df2的行。

The dimension of matrix md2: row is row of df1 and column is row of df2.

#for carb==4

 md4<-mahalanobis.dist(d1[carb==4,mpg],d2[carb==4,mpg])
              1         2
    1 0.4602308 0.8181881
    2 0.4602308 0.8181881
    3 1.2528505 0.8948932
    4 2.2500173 1.8920600
    5 2.2500173 1.8920600
    6 1.1505770 0.7926197
    7 1.5085343 1.1505770
    8 0.8693248 0.5113676

我想知道是否可以使用data.table通过carb计算,然后附加到d1。我的方法不能给出正确的答案,如下所示

I wonder whether it is possible to compute this using data.table by carb and then append to d1. My approach is not giving the right answer as you can see below

d1[,mahalanobis.dist(d1[,mpg,by=carb],d2[,mpg,by=carb]),by=carb]

     carb        V1
  1:    2 0.5925119
  2:    2 0.3136828
  3:    2 0.3136828
  4:    2 0.5576583
  5:    2 1.6381213
 ---               
178:    4 0.5925119
179:    4 0.3485364
180:    4 2.5443160
181:    4 2.5443160
182:    4 0.9759020


推荐答案

您不需要单独的数据集。只要计算原始数据集内的条件距离即可。

You don't need separate data sets. Just compute the distance by condition within your original data set

df[, mahalanobis.dist(mpg[vs == 0], mpg[vs == 1]), keyby = carb]
#    carb        V1
# 1:    2 1.0416378
# 2:    2 1.6264169
# 3:    2 1.6812399
# 4:    2 0.9502661
# 5:    2 0.2923896
# 6:    2 0.7492482
# 7:    2 1.3340273
# 8:    2 1.3888504
# 9:    2 0.6578765
# ...

其实,你可以直接在 mtcars ,而不创建任何新的数据集,例如

Actually, you can run this directly on mtcars without creating any new data sets, for example

as.data.table(mtcars)[carb %in% c(2, 4), 
                      mahalanobis.dist(mpg[vs == 0], mpg[vs == 1]), 
                      keyby = carb]

这篇关于使用data.table按组计算mahalanobis距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆