为什么要“向量化"?这个简单的R循环会产生不同的结果吗? [英] Why does "vectorizing" this simple R loop give a different result?

查看:77
本文介绍了为什么要“向量化"?这个简单的R循环会产生不同的结果吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

也许是一个很愚蠢的问题.

Perhaps a very dumb question.

我正在尝试向量化"以下循环:

I am trying to "vectorize" the following loop:

set.seed(0)
x <- round(runif(10), 2)
# [1] 0.90 0.27 0.37 0.57 0.91 0.20 0.90 0.94 0.66 0.63
sig <- sample.int(10)
# [1]  1  2  9  5  3  4  8  6  7 10
for (i in seq_along(sig)) x[i] <- x[sig[i]]
x
# [1] 0.90 0.27 0.66 0.91 0.66 0.91 0.94 0.91 0.94 0.63

我认为这只是x[sig],但结果不匹配.

I think it is simply x[sig] but the result does not match.

set.seed(0)
x <- round(runif(10), 2)
x[] <- x[sig]
x
# [1] 0.90 0.27 0.66 0.91 0.37 0.57 0.94 0.20 0.90 0.63

怎么了?

备注

显然,从输出中我们看到for循环和x[sig]是不同的.后者的含义很明确:置换,因此许多人倾向于认为循环只是在做一些错误的事情.但是永远不要这么确定.它可以是一些定义明确的动态过程.本问答的目的A并不是要判断哪个是正确的,而是要解释为什么它们不相等.希望它能为理解向量化"提供坚实的案例研究.

Obviously from the output we see that the for loop and x[sig] are different. The meaning of the latter is clear: permutation, hence many people tend to believe that the loop is just doing some wrong stuff. But never be so sure; it can be some well-defined dynamic process. The purpose of this Q & A is not to judge which is correct, but to explain why they are not equivalent. Hopefully it provides a solid case study for understanding "vectorization".

推荐答案

热身

作为热身,请考虑两个简单的示例.

warm-up

As a warm-up, consider two simpler examples.

## example 1
x <- 1:11
for (i in 1:10) x[i] <- x[i + 1]
x
# [1]  2  3  4  5  6  7  8  9 10 11 11

x <- 1:11
x[1:10] <- x[2:11]
x
# [1]  2  3  4  5  6  7  8  9 10 11 11

## example 2
x <- 1:11
for (i in 1:10) x[i + 1] <- x[i]
x
# [1] 1 1 1 1 1 1 1 1 1 1 1

x <- 1:11
x[2:11] <- x[1:10]
x
# [1]  1  1  2  3  4  5  6  7  8  9 10

在第一个示例中矢量化"成功,但在第二个示例中没有成功.为什么?

"Vectorization" is successful in the 1st example but not the 2nd. Why?

这是审慎的分析. 向量化"从循环展开开始,然后并行执行几条指令.循环是否可以向量化"取决于循环所携带的数据依赖性.

Here is prudent analysis. "Vectorization" starts by loop unrolling, then executes several instructions in parallel. Whether a loop can be "vectorized" depends on the data dependency carried by the loop.

展开示例1中的循环会得到

Unrolling the loop in example 1 gives

x[1]  <- x[2]
x[2]  <- x[3]
x[3]  <- x[4]
x[4]  <- x[5]
x[5]  <- x[6]
x[6]  <- x[7]
x[7]  <- x[8]
x[8]  <- x[9]
x[9]  <- x[10]
x[10] <- x[11]

一对一执行这些指令并同时执行它们会得到相同的结果.因此,可以对这个循环进行向量化".

Executing these instructions one by one and executing them simultaneously give identical result. So this loop can be "vectorized".

示例2中的循环为

x[2]  <- x[1]
x[3]  <- x[2]
x[4]  <- x[3]
x[5]  <- x[4]
x[6]  <- x[5]
x[7]  <- x[6]
x[8]  <- x[7]
x[9]  <- x[8]
x[10] <- x[9]
x[11] <- x[10]

不幸的是,一条一条地执行这些指令并同时执行它们不会得到相同的结果.例如,一一执行它们时,在第一条指令中修改了x[2],然后在第二条指令中将该修改后的值传递给了x[3].因此x[3]将具有与x[1]相同的值.但是,在并行执行中,x[3]等于x[2].结果,该循环不能被向量化".

Unfortunately, executing these instructions one by one and executing them simultaneously would not give identical result. For example, when executing them one by one, x[2] is modified in the 1st instruction, then this modified value is passed to x[3] in the 2nd instruction. So x[3] would have the same value as x[1]. However, in parallel execution, x[3] equals x[2]. As the result, this loop can not be "vectorized".

在向量化"理论中,

  • 示例1在数据中具有读取后写入"依赖性:x[i]在读取后被修改;
  • 示例2在数据中具有写入后读取"依赖性:x[i]在修改后被读取.
  • Example 1 has a "write-after-read" dependency in data: x[i] is modified after it is read;
  • Example 2 has a "read-after-write" dependency in data: x[i] is read after it is modified.

具有读取后写入"数据相关性的循环可以被向量化",而具有写入后读取"数据相关性的循环则不能.

A loop with "write-after-read" data dependency can be "vectorized", while a loop with "read-after-write" data dependency can not.

到目前为止,也许很多人感到困惑. 矢量化"是并行处理"吗?

Perhaps many people have been confused by now. "Vectorization" is a "parallel-processing"?

是的.在1960年代,人们想知道为高性能计算设计哪种并行处理计算机时,Flynn将设计思想分为4种类型.类别"SIMD"(单指令,多个数据)称为向量化",而具有"SIMD"可布线性的计算机称为向量处理器"或阵列处理器".

Yes. In 1960's when people wondered what kind of parallel processing computer be designed for high performance computing, Flynn classified the design ideas into 4 types. The category "SIMD" (single instruction, multiple data) is corned "vectorization", and a computer with "SIMD" cabability is called a "vector processor" or "array processor".

在1960年代,编程语言并不多.人们编写了汇编程序(当发明了编译器时是FORTRAN),以直接对CPU寄存器进行编程. "SIMD"计算机能够通过一条指令将多个数据加载到向量寄存器中,并同时对这些数据执行相同的算术运算.因此,数据处理确实是并行的.再次考虑示例1.假设向量寄存器可以容纳两个向量元素,则可以使用向量处理执行5次迭代,而不是像执行标量处理那样执行10次迭代.

In 1960's there were not many programming languages. People wrote assembly (then FORTRAN when a compiler was invented) to program CPU registers directly. A "SIMD" computer is able to load multiple data into a vector register with a single instruction and do the same arithmetic on those data at the same time. So data processing is indeed parallel. Consider our example 1 again. Suppose a vector register can hold two vector elements, then the loop can be executed with 5 iterations using vector processing rather than 10 iterations as in scalar processing.

reg <- x[2:3]  ## load vector register
x[1:2] <- reg  ## store vector register
-------------
reg <- x[4:5]  ## load vector register
x[3:4] <- reg  ## store vector register
-------------
reg <- x[6:7]  ## load vector register
x[5:6] <- reg  ## store vector register
-------------
reg <- x[8:9]  ## load vector register
x[7:8] <- reg  ## store vector register
-------------
reg <- x[10:11] ## load vector register
x[9:10] <- reg  ## store vector register

如今,有许多编程语言,例如 R . 矢量化"不再明确地指代"SIMD". R 不是我们可以编程CPU寄存器的语言. R中的向量化"仅类似于"SIMD".在先前的问答中答:在不同背景下意味着不同的事情?我试图解释这一点.下图说明了如何进行类比:

Today there are many programming languages, like R. "Vectorization" no longer unambiguously refers to "SIMD". R is not a language where we can program CPU registers. The "vectorization" in R is just an analogy to "SIMD". In a previous Q & A: Does the term "vectorization" mean different things in different contexts? I have tried to explain this. The following map illustrates how this analogy is made:

single (assembly) instruction    -> single R instruction
CPU vector registers             -> temporary vectors
parallel processing in registers -> C/C++/FORTRAN loops with temporary vectors

因此,示例1中循环的R向量化"类似于

So, the R "vectorization" of the loop in example 1 is something like

## the C-level loop is implemented by function "["
tmp <- x[2:11]  ## load data into a temporary vector
x[1:10] <- tmp  ## fill temporary vector into x

大多数时候我们都这样做

Most of the time we just do

x[1:10] <- x[2:10]

无需显式将临时向量分配给变量.创建的临时内存块没有任何R变量指向,因此将受到垃圾回收.

without explicitly assigning the temporary vector to a variable. The temporary memory block created is not pointed to by any R variable, and is therefore subject to garbage collection.

在上面,没有以最简单的示例介绍向量化".通常,矢量化"是用类似的东西引入的

In the above, "vectorization" is not introduced with the simplest example. Very often, "vectorization" is introduced with something like

a[1] <- b[1] + c[1]
a[2] <- b[2] + c[2]
a[3] <- b[3] + c[3]
a[4] <- b[4] + c[4]

其中abc在内存中没有别名,也就是说,存储矢量abc的存储块不重叠.这是一个理想的情况,因为没有内存别名就意味着没有数据依赖性.

where a, b and c are not aliased in memory, that is, the memory blocks storing vectors a, b and c do not overlap. This is an ideal case, as no memory aliasing implies no data dependency.

除数据依赖性"外,还存在控制依赖性",即在向量化"中处理"if ... else ...".但是,由于时间和空间的原因,我不会在这个问题上进行详细说明.

Apart from "data dependency", there is also "control dependency", that is, dealing with "if ... else ..." in "vectorization". However, for time and space reason I will not elaborate on this issue.

现在是时候研究问题的循环了.

Now it is time to investigate the loop in the question.

set.seed(0)
x <- round(runif(10), 2)
sig <- sample.int(10)
# [1]  1  2  9  5  3  4  8  6  7 10
for (i in seq_along(sig)) x[i] <- x[sig[i]]

展开循环即可

x[1]  <- x[1]
x[2]  <- x[2]
x[3]  <- x[9]   ## 3rd instruction
x[4]  <- x[5]
x[5]  <- x[3]   ## 5th instruction
x[6]  <- x[4]
x[7]  <- x[8]
x[8]  <- x[6]
x[9]  <- x[7]
x[10] <- x[10]

第3条指令和第5条指令之间存在写入后读取"数据相关性,因此不能对循环进行向量化"(请参见注释1 ).

There is "read-after-write" data dependency between the 3rd and the 5th instruction, so the loop can not be "vectorized" (see Remark 1).

那么,x[] <- x[sig]的作用是什么?首先让我们显式地写出临时向量:

Well then, what does x[] <- x[sig] do? Let's first explicitly write out the temporary vector:

tmp <- x[sig]
x[] <- tmp

由于"["被调用了两次,因此在此矢量化"代码后面实际上存在两个C级循环:

Since "[" is called twice, there are actually two C-level loops behind this "vectorized" code:

tmp[1]  <- x[1]
tmp[2]  <- x[2]
tmp[3]  <- x[9]
tmp[4]  <- x[5]
tmp[5]  <- x[3]
tmp[6]  <- x[4]
tmp[7]  <- x[8]
tmp[8]  <- x[6]
tmp[9]  <- x[7]
tmp[10] <- x[10]

x[1]  <- tmp[1]
x[2]  <- tmp[2]
x[3]  <- tmp[3]
x[4]  <- tmp[4]
x[5]  <- tmp[5]
x[6]  <- tmp[6]
x[7]  <- tmp[7]
x[8]  <- tmp[8]
x[9]  <- tmp[9]
x[10] <- tmp[10]

所以x[] <- x[sig]等同于

for (i in 1:10) tmp[i] <- x[sig[i]]
for (i in 1:10) x[i] <- tmp[i]
rm(tmp); gc()

这根本不是问题中给出的原始循环.

which is not at all the original loop given in the question.

如果在Rcpp中实现循环被视为矢量化",那就顺其自然了.但是没有机会用"SIMD"进一步向量化" C/C ++循环.

If implementing the loop in Rcpp is seen as a "vectorization" then let it be. But there is no chance to further "vectorize" the C / C++ loop with "SIMD".

此问与答; A由此问题解答A . OP最初呈现了一个循环

This Q & A is motivated by this Q & A. OP originally presented a loop

for (i in 1:num) {
  for (j in 1:num) {
    mat[i, j] <- mat[i, mat[j, "rm"]]
  }
}

很容易将其矢量化"为

mat[1:num, 1:num] <- mat[1:num, mat[1:num, "rm"]]

,但这可能是错误的.后来,OP将循环更改为

but it is potentially wrong. Later OP changed the loop to

for (i in 1:num) {
  for (j in 1:num) {
    mat[i, j] <- mat[i, 1 + num + mat[j, "rm"]]
  }
}

消除了内存别名问题,因为要替换的列是前num列,而要查找的列在前num列之后.

which eliminates the memory aliasing issue, because the columns to be replaced are the first num columns, while the columns to be looked up are after the first num columns.

关于问题中的循环是否正在对x进行就地"修改,我有一些评论.是的.我们可以使用tracemem:

I got some comments regarding whether the loop in the question is making "in-place" modification of x. Yes, it is. We can use tracemem:

set.seed(0)
x <- round(runif(10), 2)
sig <- sample.int(10)
tracemem(x)
#[1] "<0x28f7340>"
for (i in seq_along(sig)) x[i] <- x[sig[i]]
tracemem(x)
#[1] "<0x28f7340>"

我的R会话已为x分配了一个由地址<0x28f7340>指向的内存块,并且在运行代码时可能会看到不同的值.但是,tracemem的输出在循环后将不会更改,这意味着不会复制x.因此,该循环确实在不使用额外内存的情况下进行了就地"修改.

My R session has allocated a memory block pointed by address <0x28f7340> for x and you may see a different value when you run the code. However, the output of tracemem will not change after the loop, which means that no copy of x is made. So the loop is indeed doing "in-place" modification without using extra memory.

但是,循环没有进行就地"排列. 就地"排列是一个更复杂的操作.不仅需要沿循环交换x的元素,还需要交换sig的元素(最后,sig将为1:10).

However, the loop is not doing "in-place" permutation. "In-place" permutation is a more complicated operation. Not only elements of x need be swapped along the loop, elements of sig also need be swapped (and in the end, sig would be 1:10).

这篇关于为什么要“向量化"?这个简单的R循环会产生不同的结果吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆