使用data.table按组计算mahalanobis距离 [英] Computing mahalanobis distance by group using data.table
问题描述
我有一个下面的示例数据(d1和d2),我试图通过变量carb计算 mahalanobis.distance
,然后附加到d1。
I have a following sample data (d1 and d2) and am trying to compute the mahalanobis.distance
by a variable carb and then append to the d1.
library(data.table)
library(StatMatch) #mahalanobis.distance
df<-as.data.table(mtcars)[carb %in% c(2,4), .(mpg, carb, vs)] # two groups of carb
d1<-df[vs==0,.(mpg,carb)]
d2<-df[vs==1,.(mpg,carb)]
#for carb==2,
md2<-mahalanobis.dist(d1[carb==2,mpg],d2[carb==2,mpg])
1 2 3 4 5
1 1.0416378 1.626417 1.681240 0.9502661 0.2923896
2 0.7492482 1.334027 1.388850 0.6578765 0.5847791
3 2.1380986 2.722878 2.777701 2.0467269 0.8040713
4 2.1380986 2.722878 2.777701 2.0467269 0.8040713
5 0.4934074 1.078186 1.133010 0.4020356 0.8406200
矩阵md2的维度:row是df1的行,column是df2的行。
The dimension of matrix md2: row is row of df1 and column is row of df2.
#for carb==4
md4<-mahalanobis.dist(d1[carb==4,mpg],d2[carb==4,mpg])
1 2
1 0.4602308 0.8181881
2 0.4602308 0.8181881
3 1.2528505 0.8948932
4 2.2500173 1.8920600
5 2.2500173 1.8920600
6 1.1505770 0.7926197
7 1.5085343 1.1505770
8 0.8693248 0.5113676
我想知道是否可以使用data.table通过carb计算,然后附加到d1。我的方法不能给出正确的答案,如下所示
I wonder whether it is possible to compute this using data.table by carb and then append to d1. My approach is not giving the right answer as you can see below
d1[,mahalanobis.dist(d1[,mpg,by=carb],d2[,mpg,by=carb]),by=carb]
carb V1
1: 2 0.5925119
2: 2 0.3136828
3: 2 0.3136828
4: 2 0.5576583
5: 2 1.6381213
---
178: 4 0.5925119
179: 4 0.3485364
180: 4 2.5443160
181: 4 2.5443160
182: 4 0.9759020
推荐答案
您不需要单独的数据集。只要计算原始数据集内的条件距离即可。
You don't need separate data sets. Just compute the distance by condition within your original data set
df[, mahalanobis.dist(mpg[vs == 0], mpg[vs == 1]), keyby = carb]
# carb V1
# 1: 2 1.0416378
# 2: 2 1.6264169
# 3: 2 1.6812399
# 4: 2 0.9502661
# 5: 2 0.2923896
# 6: 2 0.7492482
# 7: 2 1.3340273
# 8: 2 1.3888504
# 9: 2 0.6578765
# ...
其实,你可以直接在 mtcars
,而不创建任何新的数据集,例如
Actually, you can run this directly on mtcars
without creating any new data sets, for example
as.data.table(mtcars)[carb %in% c(2, 4),
mahalanobis.dist(mpg[vs == 0], mpg[vs == 1]),
keyby = carb]
这篇关于使用data.table按组计算mahalanobis距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!