如何矢量化“系列"上的操作R中的向量 [英] How to vectorize an operation on a "series" of vectors in R

查看:66
本文介绍了如何矢量化“系列"上的操作R中的向量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 R 中有一个函数,它接受一个标量和一个向量作为参数,对它们执行一些返回单个值的操作.

I have a function in R which takes a scalar and a vector as arguments, to perform some operation on them returning a single value.

给定一个系列"标量(这里是向量 mya)和一个系列"向量(这里是矩阵 myv),我怎样才能对 myf 的调用进行向量化,以便 mya 中的每个元素与相应的向量一起使用在 myv?

Given a "series" of scalars (here, the vector mya) and a "series" of vectors (here, the matrix myv), how can I vectorize the call to myf so that each element in mya goes with the corresponding vector in myv?

mya = 1:3
myv = matrix(1:30, 10, 3)

myf = function(a, v) {
  return(sum(a / (a/v + 1)))
}

sapply(1:3, function(x) {myf(mya[x], myv[,x])})
# [1]  7.980123 17.649590 26.809440

所以上面我想避免循环 sapply 操作直接做类似的事情:

So above I would like to avoid the looping sapply operation to do directly something like:

myf(mya, myv)
# [1] 49.37443   <- Here I would like 3 values

这里的大问题是性能:在我的真实情况下,myamyv 分别有超过 10e6 个值或向量,而 myf> 要复杂得多.

The big issue here is performance: in my real situation, mya and myv would have more than 10e6 values or vectors respectively, and myf is much more complex.

推荐答案

在前面,您的 myv 可能被组织为一系列向量,每个向量一列;许多工具最好将其转换为向量的list.

Up front, your myv might be organized as a series of vectors, one column each; it is better for many tools to convert it into a list of vectors.

asplit(myv, 2)
# [[1]]
#  [1]  1  2  3  4  5  6  7  8  9 10
# [[2]]
#  [1] 11 12 13 14 15 16 17 18 19 20
# [[3]]
#  [1] 21 22 23 24 25 26 27 28 29 30

基础 R

sapply/lapply 是到单个向量/列表,就像 mapply/Map 是到 n 个.

base R

sapply/lapply are to a single vector/list as mapply/Map are to n of them.

Map(myf, mya, asplit(myv , 2))
# [[1]]
# [1] 7.980123
# [[2]]
# [1] 17.64959
# [[3]]
# [1] 26.80944
mapply(myf, mya, asplit(myv , 2))
# [1]  7.980123 17.649590 26.809440

tidyverse

参数的顺序是不同的,而不是单个参数,它需要在 list 本身中的所有参数.

purrr::pmap(list(mya, asplit(myv , 2)), myf)
# [[1]]
# [1] 7.980123
# [[2]]
# [1] 17.64959
# [[3]]
# [1] 26.80944
purrr::pmap_dbl(list(mya, asplit(myv , 2)), myf)
# [1]  7.980123 17.649590 26.809440


根据评论的替代方法.


Alternative approach, given the comments.

这种方法确实是矢量化的,但对函数进行了一些解构.

This approach truly is vectorized, but has deconstructed the function a little.

colSums(t(mya / (mya / t(myv) + 1)))
# [1]  7.980123 17.649590 26.809440

为了达到这一点,人们需要认识到 t 在哪里转换,这是必要的.我将从一些已知点开始:

To get to this point, one needs to recognize where transpose and such is necessary. I'll start with some known points:

mya[1] / myv[,1] + 1
#  [1] 2.000000 1.500000 1.333333 1.250000 1.200000 1.166667 1.142857 1.125000 1.111111 1.100000

为了模拟矩阵(而不仅仅是向量),我们可以尝试

In order to mimic that with matrices (and not just vectors), we might try

(mya / myv + 1)
#           [,1]     [,2]     [,3]
#  [1,] 2.000000 1.181818 1.142857
#  [2,] 2.000000 1.250000 1.045455
#  [3,] 2.000000 1.076923 1.086957
#  [4,] 1.250000 1.142857 1.125000
#  [5,] 1.400000 1.200000 1.040000
#  [6,] 1.500000 1.062500 1.076923
#  [7,] 1.142857 1.117647 1.111111
#  [8,] 1.250000 1.166667 1.035714
#  [9,] 1.333333 1.052632 1.068966
# [10,] 1.100000 1.100000 1.100000

但是如果你注意到,myamyv 的划分是按列划分的,所以它扩展为

But if you notice, the division of mya over myv is column-wise, so it is expanding to

c(mya[1] / myv[1,1], mya[2] / myv[2,1], mya[3] / myv[3,1], mya[1] / myv[4,1], ...)

我们希望它被转置的地方.好的,所以我们转置它,以便 myvrows 垂直于划分.

where we would prefer it to be transposed. Okay, so we transpose it so that the rows of myv are vertical for the division.

(mya / t(myv) + 1)[1,]
#  [1] 2.000000 1.500000 1.333333 1.250000 1.200000 1.166667 1.142857 1.125000 1.111111 1.100000

这样更好.现在我们需要为下一步做同样的事情.这让我们

That's better. Now we need to do the same for the next step. That brings us to

t(mya / (mya / t(myv) + 1))
#            [,1]     [,2]     [,3]
#  [1,] 0.5000000 1.692308 2.625000
#  [2,] 0.6666667 1.714286 2.640000
#  [3,] 0.7500000 1.733333 2.653846
#  [4,] 0.8000000 1.750000 2.666667
#  [5,] 0.8333333 1.764706 2.678571
#  [6,] 0.8571429 1.777778 2.689655
#  [7,] 0.8750000 1.789474 2.700000
#  [8,] 0.8888889 1.800000 2.709677
#  [9,] 0.9000000 1.809524 2.718750
# [10,] 0.9090909 1.818182 2.727273

因为您想对每个 mya 值求和.知道我们在 mya 中有三个并且我们看到三列,人们可能会推断我们需要对每一列求和.我们可以凭经验证明:

Since you wanted to sum across each of the mya values. Knowing that we have three in mya and we see three columns, one might infer we need to sum each column. We can prove that empirically:

sum(mya[1] / (mya[1] / myv[,1] + 1))
# [1] 7.980123
colSums(t(mya / (mya / t(myv) + 1)))
# [1]  7.980123 17.649590 26.809440

但实际上,当我们不能对行进行转置和求和时,我们不需要t对列进行排序然后求和:-)

But really, we don't need to tranpose then sum columns when we can not-transpose and sum the rows :-)

rowSums(mya / (mya / t(myv) + 1))
# [1]  7.980123 17.649590 26.809440

这篇关于如何矢量化“系列"上的操作R中的向量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆