如何矢量化“系列"上的操作R中的向量 [英] How to vectorize an operation on a "series" of vectors in R
问题描述
我在 R 中有一个函数,它接受一个标量和一个向量作为参数,对它们执行一些返回单个值的操作.
I have a function in R which takes a scalar and a vector as arguments, to perform some operation on them returning a single value.
给定一个系列"标量(这里是向量 mya
)和一个系列"向量(这里是矩阵 myv
),我怎样才能对 myf
的调用进行向量化,以便 mya
中的每个元素与相应的向量一起使用在 myv
?
Given a "series" of scalars (here, the vector mya
) and a "series" of vectors (here, the matrix myv
), how can I vectorize the call to myf
so that each element in mya
goes with the corresponding vector in myv
?
mya = 1:3
myv = matrix(1:30, 10, 3)
myf = function(a, v) {
return(sum(a / (a/v + 1)))
}
sapply(1:3, function(x) {myf(mya[x], myv[,x])})
# [1] 7.980123 17.649590 26.809440
所以上面我想避免循环 sapply
操作直接做类似的事情:
So above I would like to avoid the looping sapply
operation to do directly something like:
myf(mya, myv)
# [1] 49.37443 <- Here I would like 3 values
这里的大问题是性能:在我的真实情况下,mya
和 myv
分别有超过 10e6 个值或向量,而 myf
> 要复杂得多.
The big issue here is performance: in my real situation, mya
and myv
would have more than 10e6 values or vectors respectively, and myf
is much more complex.
推荐答案
在前面,您的 myv
可能被组织为一系列向量,每个向量一列;许多工具最好将其转换为向量的list
.
Up front, your myv
might be organized as a series of vectors, one column each; it is better for many tools to convert it into a list
of vectors.
asplit(myv, 2)
# [[1]]
# [1] 1 2 3 4 5 6 7 8 9 10
# [[2]]
# [1] 11 12 13 14 15 16 17 18 19 20
# [[3]]
# [1] 21 22 23 24 25 26 27 28 29 30
基础 R
sapply
/lapply
是到单个向量/列表,就像 mapply
/Map
是到 n
个.
base R
sapply
/lapply
are to a single vector/list as mapply
/Map
are to n
of them.
Map(myf, mya, asplit(myv , 2))
# [[1]]
# [1] 7.980123
# [[2]]
# [1] 17.64959
# [[3]]
# [1] 26.80944
mapply(myf, mya, asplit(myv , 2))
# [1] 7.980123 17.649590 26.809440
tidyverse
参数的顺序是不同的,而不是单个参数,它需要在 list
本身中的所有参数.
purrr::pmap(list(mya, asplit(myv , 2)), myf)
# [[1]]
# [1] 7.980123
# [[2]]
# [1] 17.64959
# [[3]]
# [1] 26.80944
purrr::pmap_dbl(list(mya, asplit(myv , 2)), myf)
# [1] 7.980123 17.649590 26.809440
根据评论的替代方法.
Alternative approach, given the comments.
这种方法确实是矢量化的,但对函数进行了一些解构.
This approach truly is vectorized, but has deconstructed the function a little.
colSums(t(mya / (mya / t(myv) + 1)))
# [1] 7.980123 17.649590 26.809440
为了达到这一点,人们需要认识到 t
在哪里转换,这是必要的.我将从一些已知点开始:
To get to this point, one needs to recognize where t
ranspose and such is necessary. I'll start with some known points:
mya[1] / myv[,1] + 1
# [1] 2.000000 1.500000 1.333333 1.250000 1.200000 1.166667 1.142857 1.125000 1.111111 1.100000
为了模拟矩阵(而不仅仅是向量),我们可以尝试
In order to mimic that with matrices (and not just vectors), we might try
(mya / myv + 1)
# [,1] [,2] [,3]
# [1,] 2.000000 1.181818 1.142857
# [2,] 2.000000 1.250000 1.045455
# [3,] 2.000000 1.076923 1.086957
# [4,] 1.250000 1.142857 1.125000
# [5,] 1.400000 1.200000 1.040000
# [6,] 1.500000 1.062500 1.076923
# [7,] 1.142857 1.117647 1.111111
# [8,] 1.250000 1.166667 1.035714
# [9,] 1.333333 1.052632 1.068966
# [10,] 1.100000 1.100000 1.100000
但是如果你注意到,mya
对 myv
的划分是按列划分的,所以它扩展为
But if you notice, the division of mya
over myv
is column-wise, so it is expanding to
c(mya[1] / myv[1,1], mya[2] / myv[2,1], mya[3] / myv[3,1], mya[1] / myv[4,1], ...)
我们希望它被转置的地方.好的,所以我们转置它,以便 myv
的 rows 垂直于划分.
where we would prefer it to be transposed. Okay, so we transpose it so that the rows of myv
are vertical for the division.
(mya / t(myv) + 1)[1,]
# [1] 2.000000 1.500000 1.333333 1.250000 1.200000 1.166667 1.142857 1.125000 1.111111 1.100000
这样更好.现在我们需要为下一步做同样的事情.这让我们
That's better. Now we need to do the same for the next step. That brings us to
t(mya / (mya / t(myv) + 1))
# [,1] [,2] [,3]
# [1,] 0.5000000 1.692308 2.625000
# [2,] 0.6666667 1.714286 2.640000
# [3,] 0.7500000 1.733333 2.653846
# [4,] 0.8000000 1.750000 2.666667
# [5,] 0.8333333 1.764706 2.678571
# [6,] 0.8571429 1.777778 2.689655
# [7,] 0.8750000 1.789474 2.700000
# [8,] 0.8888889 1.800000 2.709677
# [9,] 0.9000000 1.809524 2.718750
# [10,] 0.9090909 1.818182 2.727273
因为您想对每个 mya
值求和.知道我们在 mya
中有三个并且我们看到三列,人们可能会推断我们需要对每一列求和.我们可以凭经验证明:
Since you wanted to sum across each of the mya
values. Knowing that we have three in mya
and we see three columns, one might infer we need to sum each column. We can prove that empirically:
sum(mya[1] / (mya[1] / myv[,1] + 1))
# [1] 7.980123
colSums(t(mya / (mya / t(myv) + 1)))
# [1] 7.980123 17.649590 26.809440
但实际上,当我们不能对行进行转置和求和时,我们不需要t
对列进行排序然后求和:-)
But really, we don't need to t
ranpose then sum columns when we can not-transpose and sum the rows :-)
rowSums(mya / (mya / t(myv) + 1))
# [1] 7.980123 17.649590 26.809440
这篇关于如何矢量化“系列"上的操作R中的向量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!