映射以获得更好的性能 [英] mapply for better performance
问题描述
我想将一个函数应用于矩阵输入a
,此函数会将第一个元素更改为c[a[1]]
,将下一个元素更改为b[a[i],a[i+1]]
,从i = 1
到i = ncol(a) - 1
.
I want to apply a function to a matrix input a
, this function would change the first element to c[a[1]]
and the next elements to b[a[i],a[i+1]]
starting from i = 1
up to i = ncol(a) - 1
.
示例输入:
a <- matrix(c(1,4,3,1),nrow=1)
b <- matrix(1:25,ncol=5,nrow=5)
c <- matrix(4:8,ncol=5,nrow=1)
预期输出:
>a
4 16 14 3
#c[a[1]] gave us the first element: 4
#b[a[1],a[2]] gave us the second element: 16
#b[a[2],a[3]] gave us the third element: 14
#b[a[3],a[4]] gave us the fourth element: 3
到目前为止,我一直在尝试使用mapply()
,但没有成功.这样做的目的是避免循环,因为这会导致R的主要性能下降.
I've been trying to use mapply()
without any success so far. The idea is to avoid loops since those things can lead to major performance decrease in R
推荐答案
步骤1:使用单个索引寻址矩阵
在R中,矩阵元素按列优先顺序存储在向量中,因此A[i, j]
与A[(j-1)*nrow(A) + i]
相同.考虑一个3×3随机矩阵的例子:
In R matrix elements are stored in column-major order into a vector, so A[i, j]
is the same as A[(j-1)*nrow(A) + i]
. Consider an example of random 3-by-3 matrix:
set.seed(1); A <- round(matrix(runif(9), 3, 3), 2)
> A
[,1] [,2] [,3]
[1,] 0.27 0.91 0.94
[2,] 0.37 0.20 0.66
[3,] 0.57 0.90 0.63
现在,此矩阵有3行(nrow(A) = 3
).比较:
Now, this matrix has 3 rows (nrow(A) = 3
). Compare:
A[2,3] # 0.66
A[(3-1) * 3 + 2] # 0.66
第2步:向量化
您可以一次处理矩阵的多个元素. 但是,您只能通过使用单个索引模式来做到这一点(此处不够精确,请参阅稍后的@alexis_laz的评论).例如,如果要提取A[1,2]
和A[3,1]
,但是如果要提取:
You can address multiple elements of a matrix at a time. However, you can only do this by using single indexing mode (Not too precise here, see @alexis_laz's remark later). For example, if you want to extract A[1,2]
and A[3,1]
, but if you do:
A[c(1,3), c(2,1)]
# [,1] [,2]
# [1,] 0.91 0.27
# [2,] 0.90 0.57
您实际上得到了一个障碍.现在,如果您使用单一索引,您将获得所需的内容:
You actually get a block. Now, if you use single indexing, you get what you need:
A[3 * (c(2,1) - 1) + c(1,3)]
# [1] 0.91 0.57
步骤3:为您的问题获取单个索引
假设n <- length(a)
,并且您要解决b
的那些元素:
Suppose n <- length(a)
and you want to address those elements of b
:
a[1] a[2]
a[2] a[3]
. .
. .
a[n-1] a[n]
您可以使用单个索引nrow(b) * (a[2:n] - 1) + a[1:(n-1)]
.
第4步:完整的解决方案
由于a
和c
只有一行,因此应将它们存储为向量而不是矩阵.
Since you only have single row for a
and c
, you should store them as vectors rather than matrices.
a <- c(1,4,3,1)
c <- 4:8
如果给定一个矩阵并且别无选择(因为它们当前在您的问题中),则可以通过以下方法将它们转换为向量:
If you were given a matrix and have no choice (as they are currently are in your question), you can convert them into vectors by:
a <- as.numeric(a)
c <- as.numeric(c)
现在,如前所述,我们有地址b
矩阵的索引:
Now, as discussed, we have index for address b
matrix:
n <- length(a)
b_ind <- nrow(b) * (a[2:n] - 1) + a[1:(n-1)]
您还将地址c
的a[1]
元素作为最终结果的第一个元素,因此我们需要通过以下方式连接:c[a[1]]
和b[b_ind]
:
You also address a[1]
element of c
as the first element of your final result, so we need concatenate: c[a[1]]
and b[b_ind]
by:
a <- c(c[a[1]], b[b_ind])
# > a
# [1] 4 16 14 3
这种方法是完全矢量化的,甚至优于*apply
系列.
This approach is fully vectorized, even better than *apply
family.
alexis_laz的评论
alexis_laz提醒我,我们也可以使用矩阵索引",即,我们也可以通过以下方式对矩阵b
进行寻址:
alexis_laz reminds me that we can use "matrix-index" as well, i.e., we can also address matrix b
via:
b[cbind(a[1:(n-1)],a[2:n])] ## or b[cbind(a[-n], a[-1])]
但是,我认为使用单个索引会稍快一些,因为我们需要逐行访问索引矩阵以寻址b
,因此与使用矢量索引相比,我们要付出更高的内存延迟.
However, I think using single index is slightly faster, because we need to access the index matrix by row in order to address b
, so we pay higher memory latency than using vector index.
这篇关于映射以获得更好的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!