映射以获得更好的性能 [英] mapply for better performance

查看:117
本文介绍了映射以获得更好的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将一个函数应用于矩阵输入a,此函数会将第一个元素更改为c[a[1]],将下一个元素更改为b[a[i],a[i+1]],从i = 1i = ncol(a) - 1.

I want to apply a function to a matrix input a, this function would change the first element to c[a[1]] and the next elements to b[a[i],a[i+1]] starting from i = 1 up to i = ncol(a) - 1.

示例输入:

a <- matrix(c(1,4,3,1),nrow=1)
b <- matrix(1:25,ncol=5,nrow=5)
c <- matrix(4:8,ncol=5,nrow=1)

预期输出:

>a
4 16 14 3

#c[a[1]] gave us the first element: 4
#b[a[1],a[2]] gave us the second element: 16 
#b[a[2],a[3]] gave us the third element: 14
#b[a[3],a[4]] gave us the fourth element: 3

到目前为止,我一直在尝试使用mapply(),但没有成功.这样做的目的是避免循环,因为这会导致R的主要性能下降.

I've been trying to use mapply() without any success so far. The idea is to avoid loops since those things can lead to major performance decrease in R

推荐答案

步骤1:使用单个索引寻址矩阵

在R中,矩阵元素按列优先顺序存储在向量中,因此A[i, j]A[(j-1)*nrow(A) + i]相同.考虑一个3×3随机矩阵的例子:

In R matrix elements are stored in column-major order into a vector, so A[i, j] is the same as A[(j-1)*nrow(A) + i]. Consider an example of random 3-by-3 matrix:

set.seed(1); A <- round(matrix(runif(9), 3, 3), 2)

> A
     [,1] [,2] [,3]
[1,] 0.27 0.91 0.94
[2,] 0.37 0.20 0.66
[3,] 0.57 0.90 0.63

现在,此矩阵有3行(nrow(A) = 3).比较:

Now, this matrix has 3 rows (nrow(A) = 3). Compare:

A[2,3]  # 0.66
A[(3-1) * 3 + 2]  # 0.66

第2步:向量化

您可以一次处理矩阵的多个元素. 但是,您只能通过使用单个索引模式来做到这一点(此处不够精确,请参阅稍后的@alexis_laz的评论).例如,如果要提取A[1,2]A[3,1],但是如果要提取:

You can address multiple elements of a matrix at a time. However, you can only do this by using single indexing mode (Not too precise here, see @alexis_laz's remark later). For example, if you want to extract A[1,2] and A[3,1], but if you do:

A[c(1,3), c(2,1)]
#      [,1] [,2]
# [1,] 0.91 0.27
# [2,] 0.90 0.57

您实际上得到了一个障碍.现在,如果您使用单一索引,您将获得所需的内容:

You actually get a block. Now, if you use single indexing, you get what you need:

A[3 * (c(2,1) - 1) + c(1,3)]
# [1] 0.91 0.57

步骤3:为您的问题获取单个索引

假设n <- length(a),并且您要解决b的那些元素:

Suppose n <- length(a) and you want to address those elements of b:

a[1]    a[2]
a[2]    a[3]
 .       .
 .       .
a[n-1]  a[n]

您可以使用单个索引nrow(b) * (a[2:n] - 1) + a[1:(n-1)].

第4步:完整的解决方案

由于ac只有一行,因此应将它们存储为向量而不是矩阵.

Since you only have single row for a and c, you should store them as vectors rather than matrices.

a <- c(1,4,3,1)
c <- 4:8

如果给定一个矩阵并且别无选择(因为它们当前在您的问题中),则可以通过以下方法将它们转换为向量:

If you were given a matrix and have no choice (as they are currently are in your question), you can convert them into vectors by:

a <- as.numeric(a)
c <- as.numeric(c)

现在,如前所述,我们有地址b矩阵的索引:

Now, as discussed, we have index for address b matrix:

n <- length(a)
b_ind <- nrow(b) * (a[2:n] - 1) + a[1:(n-1)]

您还将地址ca[1]元素作为最终结果的第一个元素,因此我们需要通过以下方式连接:c[a[1]]b[b_ind]:

You also address a[1] element of c as the first element of your final result, so we need concatenate: c[a[1]] and b[b_ind] by:

a <- c(c[a[1]], b[b_ind])
# > a
# [1]  4 16 14  3

这种方法是完全矢量化的,甚至优于*apply系列.

This approach is fully vectorized, even better than *apply family.

alexis_laz的评论

alexis_laz提醒我,我们也可以使用矩阵索引",即,我们也可以通过以下方式对矩阵b进行寻址:

alexis_laz reminds me that we can use "matrix-index" as well, i.e., we can also address matrix b via:

b[cbind(a[1:(n-1)],a[2:n])]  ## or b[cbind(a[-n], a[-1])]

但是,我认为使用单个索引会稍快一些,因为我们需要逐行访问索引矩阵以寻址b,因此与使用矢量索引相比,我们要付出更高的内存延迟.

However, I think using single index is slightly faster, because we need to access the index matrix by row in order to address b, so we pay higher memory latency than using vector index.

这篇关于映射以获得更好的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆