用H2O存储距离的最佳方法是什么? [英] What is the best way to store distances with H2O?

查看:61
本文介绍了用H2O存储距离的最佳方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有2个data.frame,并且我想计算它们所有行之间的欧几里得距离.我的代码是:

Supose I have 2 data.frames and I want to calculate the euclidean distance between all of the rows of them. My code is:

set.seed(121)
# Load library
library(h2o)
system.time({
  h2o.init()
  # Create the df and convert to h2o frame format
  df1 <- as.h2o(matrix(rnorm(7500 * 40), ncol = 40))
  df2 <- as.h2o(matrix(rnorm(1250 * 40), ncol = 40))
  # Create a matrix in which I will record the distances
  matrix1 <- as.h2o(matrix(0, nrow = 7500, ncol = 40))
  # Loop to calculate all the distances
  for (i in 1:nrow(df2)){
    matrix1[, i] <- h2o.sqrt(h2o.distance(df1, df2[, i]))
  }
})

我敢肯定有一种将其存储到矩阵中的更有效的方法.

I´m sure there is more efficient way to store it into a matrix.

推荐答案

您无需计算循环内的距离,H2O的距离功能可以有效地计算所有行的距离.对于具有n x km x k尺寸的两个数据框,可以通过以下方式找到n x m距离矩阵:

You don't need to calculate the distance inside a loop, H2O's distance function can efficiently calculate distances for all the rows. For two data frames with n x k and m x k dimensions, you can find the n x m distance matrix in a following way:

distance_matrix <- h2o.distance(df1, df2, 'l2')

由于-绝对距离(L1范数),"l2"-欧几里德距离(L2范数),"cosine"-余弦相似度和"cosine_sq"-余弦平方相似度.

There is no need to take the square root, since h2o.distance() function allows you to specify what distance measure to use: "l1" - Absolute distance (L1 norm), "l2" - Euclidean distance (L2 norm), "cosine" - Cosine similarity and "cosine_sq" - Squared Cosine similarity.

在您的示例之后,用于计算欧几里得距离矩阵的代码将是:

Following your example, the code to calculate the Euclidean distance matrix will be:

library(h2o)
h2o.init()
df1 <- as.h2o(matrix(rnorm(7500 * 40), ncol = 40))
df2 <- as.h2o(matrix(rnorm(1250 * 40), ncol = 40))
distance_matrix <- h2o.distance(df1, df2, 'l2')

产生尺寸为7500 rows x 1250 columns的矩阵.

这篇关于用H2O存储距离的最佳方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆