R - 如何获得行 &距离矩阵中匹配元素的列下标 [英] R - How to get row & column subscripts of matched elements from a distance matrix

查看：18 发布时间：2021/12/18 23:38:24 r matrix distance euclidean-distance

本文介绍了R - 如何获得行 &距离矩阵中匹配元素的列下标的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个整数向量 vec1 并且我正在使用 dist 函数生成一个远程矩阵.我想获取距离矩阵中某个值的元素的坐标(行和列).本质上，我想获得相距 d 距离的一对元素.例如:

vec1 <- c(2,3,6,12,17)distMatrix <- dist(vec1)# 1 2 3 4#2 1#3 4 3#4 10 9 6#5 15 14 11 5

比如说，我对向量中相距 5 个单位的一对元素感兴趣.我想得到坐标 1 是行和坐标 2 是距离矩阵的列.在这个玩具示例中，我希望

coord1# [1] 5坐标2# [1] 4

我想知道是否有一种不涉及将 dist 对象转换为矩阵或遍历矩阵的有效方法来获取这些值?

解决方案

距离矩阵是压缩格式的下三角矩阵，其中下三角矩阵按列存储为一维向量.您可以通过

查看

str(distMatrix)# Class 'dist' atomic [1:10] 1 4 10 15 3 9 14 6 11 5# ...

即使我们调用dist(vec1, diag = TRUE, upper = TRUE)，向量还是一样的；只有打印样式会发生变化.也就是说，无论你如何调用 dist，你总是得到一个向量.

此答案侧重于如何在 1D 和 2D 索引之间进行转换，以便您可以使用dist"对象，而无需先使用 as.matrix 使其成为完整矩阵.如果您确实想让它成为矩阵，请使用

<小时>

R 函数

为这些索引变换编写矢量化的 R 函数很容易.我们只需要处理越界"索引，应该返回NA.

## 二维索引转一维索引f <- 函数 (i, j, dist_obj) {if (!inherits(dist_obj, "dist")) stop("请提供一个'dist'对象")n <- attr(dist_obj, "Size")有效 <- (i >= 1) &(j >= 1) &(i > j) &(i <= n) &(j <= n)k <- (2 * n - j) * (j - 1)/2 + (i - j)k[!valid] <- NA_real_克}## 一维索引转二维索引finv <- 函数(k，dist_obj){if (!inherits(dist_obj, "dist")) stop("请提供一个'dist'对象")n <- attr(dist_obj, "Size")有效 <- (k >= 1) &(k <= n * (n - 1)/2)k_valid <- k[有效]j <- rep.int(NA_real_, length(k))j[valid] <- floor(((2 * n + 1) - sqrt((2 * n - 1) ^ 2 - 8 * (k_valid - 1)))/2)i <- j + k - (2 * n - j) * (j - 1)/2cbind(i, j)}

这些函数在内存使用方面非常便宜，因为它们使用索引而不是矩阵.

<小时>

将 `finv` 应用于您的问题

你可以使用

vec1 <- c(2,3,6,12,17)distMatrix <- dist(vec1)finv(which(distMatrix == 5), distMatrix)#我j#[1,] 5 4

一般来说，距离矩阵包含浮点数.用==判断两个浮点数是否相等是有风险的.阅读为什么这些数字不相等?了解更多和可能的策略.

<小时>

替代 `dist2mat`

在距离对象上使用 as.matrix 中给出的 dist2mat 函数非常慢；如何让它更快?，我们可以使用which(, arr.ind = TRUE).

库(Rcpp)sourceCpp("dist2mat.cpp")mat <- dist2mat(distMatrix, 128)which(mat == 5, arr.ind = TRUE)# 行列#5 5 4#4 4 5

<小时>

附录:图片的 Markdown(需要 MathJax 支持)

## 二维索引转一维索引下三角看起来像这样: $$egin{pmatrix} 0 &0 &cdots &0\ 	imes &0 &cdots &0\ 	imes &	imes &cdots &0\ vdots &vdots &ddots &0\ 	imes &	imes &cdots &0end{pmatrix}$$ 如果矩阵是$n 	imes n$，那么第一列就有$(n - 1)$个元素("$	imes$")，$(n - j)第 j^个中的 $ 元素柱子.因此，对于下三角中的元素 $(i, j)$ (with $i > j$, $j th中的$	imes$"柱子.所以它是 $$left{frac{(2n - j)(j - 1)}{2} + (i - j)
ight}^{	extit{th}}$$ "$次$"在下三角形中.----## 一维索引转二维索引现在是 $k$^th"$	imes$" 在下三角中，我们如何找到它的矩阵索引$(i, j)$?我们采取两个步骤:1>找到 $j$;2>从 $k$ 和 $j$ 获得 $i$.$j$^th的第一个"$	imes$"列，即 $(j + 1, j)$，是 $left{frac{(2n - j)(j - 1)}{2} + 1
ight}^{	extit{th}}$ "$	imes$" ，因此 $j$ 是最大值，使得 $frac{(2n - j)(j - 1)}{2} + 1 leq k$.这相当于找到最大$j$，使得$$j^2 - (2n + 1)j + 2(k + n - 1) geq 0.$$ LHS是二次多项式，很容易看解是不大于第一个根的整数(即左边的根): $$j = leftlfloorfrac{(2n + 1) - sqrt{(2n-1)^2 - 8(k-1)}}{2}
ight
floor.$$ 那么可以从 $$i = j + k - left{frac{(2n - j)(j - 1)}{2}
ight}.$$

I have an integer vector vec1 and I am generating a distant matrix using dist function. I want to get the coordinates (row and column) of element of certain value in the distance matrix. Essentially I would like to get the pair of elements that are d-distant apart. For example:

vec1 <- c(2,3,6,12,17)
distMatrix <- dist(vec1)

#   1  2  3  4
#2  1         
#3  4  3      
#4 10  9  6   
#5 15 14 11  5

Say, I am interested in pair of elements in the vector that are 5 unit apart. I wanted to get the coordinate1 which are the rows and coordinate2 which are the columns of the distance matrix. In this toy example, I would expect

coord1  
# [1] 5
coord2
# [1] 4

I am wondering if there is an efficient way to get these values that doesn't involve converting the dist object to a matrix or looping through the matrix?

解决方案

A distance matrix is a lower triangular matrix in packed format, where the lower triangular is stored as a 1D vector by column. You can check this via

str(distMatrix)
# Class 'dist'  atomic [1:10] 1 4 10 15 3 9 14 6 11 5
# ...

Even if we call dist(vec1, diag = TRUE, upper = TRUE), the vector is still the same; only the printing styles changes. That is, no matter how you call dist, you always get a vector.

This answer focus on how to transform between 1D and 2D index, so that you can work with a "dist" object without first making it a complete matrix using as.matrix. If you do want to make it a matrix, use the dist2mat function defined in as.matrix on a distance object is extremely slow; how to make it faster?.

R functions

It is easy to write vectorized R functions for those index transforms. We only need some care dealing with "out-of-bound" index, for which NA should be returned.

## 2D index to 1D index
f <- function (i, j, dist_obj) {
  if (!inherits(dist_obj, "dist")) stop("please provide a 'dist' object")
  n <- attr(dist_obj, "Size")
  valid <- (i >= 1) & (j >= 1) & (i > j) & (i <= n) & (j <= n)
  k <- (2 * n - j) * (j - 1) / 2 + (i - j)
  k[!valid] <- NA_real_
  k
  }

## 1D index to 2D index
finv <- function (k, dist_obj) {
  if (!inherits(dist_obj, "dist")) stop("please provide a 'dist' object")
  n <- attr(dist_obj, "Size")
  valid <- (k >= 1) & (k <= n * (n - 1) / 2)
  k_valid <- k[valid]
  j <- rep.int(NA_real_, length(k))
  j[valid] <- floor(((2 * n + 1) - sqrt((2 * n - 1) ^ 2 - 8 * (k_valid - 1))) / 2)
  i <- j + k - (2 * n - j) * (j - 1) / 2
  cbind(i, j)
  }

These functions are extremely cheap in memory usage, as they work with index instead of matrices.

Applying `finv` to your question

You can use

vec1 <- c(2,3,6,12,17)
distMatrix <- dist(vec1)

finv(which(distMatrix == 5), distMatrix)
#     i j
#[1,] 5 4

Generally speaking, a distance matrix contains floating point numbers. It is risky to use == to judge whether two floating point numbers are equal. Read Why are these numbers not equal? for more and possible strategies.

Alternative with `dist2mat`

Using the dist2mat function given in as.matrix on a distance object is extremely slow; how to make it faster?, we may use which(, arr.ind = TRUE).

library(Rcpp)
sourceCpp("dist2mat.cpp")
mat <- dist2mat(distMatrix, 128)
which(mat == 5, arr.ind = TRUE)
#  row col
#5   5   4
#4   4   5

Appendix: Markdown (needs MathJax support) for the picture

## 2D index to 1D index

The lower triangular looks like this: $$egin{pmatrix} 0 & 0 & cdots & 0\ 	imes & 0 & cdots & 0\ 	imes & 	imes & cdots & 0\ vdots & vdots & ddots & 0\ 	imes & 	imes & cdots & 0end{pmatrix}$$ If the matrix is $n 	imes n$, then there are $(n - 1)$ elements ("$	imes$") in the 1st column, and $(n - j)$ elements in the j<sup>th</sup> column. Thus, for element $(i,  j)$ (with $i > j$, $j < n$) in the lower triangular, there are $$(n - 1) + cdots (n - (j - 1)) = frac{(2n - j)(j - 1)}{2}$$ "$	imes$" in the previous $(j - 1)$ columns, and it is the $(i - j)$<sup>th</sup> "$	imes$" in the $j$<sup>th</sup> column. So it is the $$left{frac{(2n - j)(j - 1)}{2} + (i - j)
ight}^{	extit{th}}$$ "$	imes$" in the lower triangular.

----

## 1D index to 2D index

Now for the $k$<sup>th</sup> "$	imes$" in the lower triangular, how can we find its matrix index $(i, j)$? We take two steps: 1> find $j$;  2> obtain $i$ from $k$ and $j$.

The first "$	imes$" of the $j$<sup>th</sup> column, i.e., $(j + 1, j)$, is the $left{frac{(2n - j)(j - 1)}{2} + 1
ight}^{	extit{th}}$ "$	imes$" of the lower triangular, thus $j$ is the maximum value such that $frac{(2n - j)(j - 1)}{2} + 1 leq k$. This is equivalent to finding the max $j$ so that $$j^2 - (2n + 1)j + 2(k + n - 1) geq 0.$$ The LHS is a quadratic polynomial, and it is easy to see that the solution is the integer no larger than its first root (i.e., the root on the left side): $$j = leftlfloorfrac{(2n + 1) - sqrt{(2n-1)^2 - 8(k-1)}}{2}
ight
floor.$$ Then $i$ can be obtained from $$i = j + k - left{frac{(2n - j)(j - 1)}{2}
ight}.$$

这篇关于R - 如何获得行 &距离矩阵中匹配元素的列下标的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R - 如何获得行 &距离矩阵中匹配元素的列下标 [英] R - How to get row & column subscripts of matched elements from a distance matrix

问题描述

R 函数

将 `finv` 应用于您的问题

替代 `dist2mat`

附录:图片的 Markdown(需要 MathJax 支持)

R functions

Applying `finv` to your question

Alternative with `dist2mat`

Appendix: Markdown (needs MathJax support) for the picture

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R - 如何获得行 &amp;距离矩阵中匹配元素的列下标 [英] R - How to get row &amp; column subscripts of matched elements from a distance matrix

问题描述

R 函数

将 finv 应用于您的问题

替代 dist2mat

附录:图片的 Markdown(需要 MathJax 支持)

R functions

Applying finv to your question

Alternative with dist2mat

Appendix: Markdown (needs MathJax support) for the picture

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

R - 如何获得行 &距离矩阵中匹配元素的列下标 [英] R - How to get row & column subscripts of matched elements from a distance matrix

将 `finv` 应用于您的问题

替代 `dist2mat`

Applying `finv` to your question

Alternative with `dist2mat`

登录关闭