是否有可能优化a = func(a)的编译器? [英] Is it possible to have a compiler which optimizes a = func(a)?

查看:116
本文介绍了是否有可能优化a = func(a)的编译器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个类型为A的对象。对于这种类型为A的任何函数,请考虑这种情况-> A(即,使用类型为A的对象并返回另一个类型为A的对象):

  foo = func(foo)

在这里,最简单的情况是将 func(foo)的结果复制到 foo 中。
是否可以对此进行优化,以便:




  • foo 得到在 func



中就地修改的语言没有限制。我想知道的是,语言必须具有哪些约束和属性才能实现这种优化。



示例(伪代码):

  type Matrix = List< List< int>> 

矩阵rotate90Deg(矩阵x):
矩阵结果(x.columns,x.rows)#假定它有一个构造函数,它将行数和列数作为args。
for(int i = 0; i for(int j = 0; j< x.columns; j ++):
结果[i] [j] = x [j] [i]
返回结果

矩阵a = [[1,2,3],[4,5,6],[7,8, 9]]
a = rotation90Deg(a)

此处,是否可以优化代码因此,它不会为新的矩阵(结果)分配内存,而只是修改传递的原始矩阵。

解决方案

首先,您必须意识到某些操作本来就不可能就地计算。矩阵矩阵乘法就是一个例子, rotate90Deg 属于此类,因为这样的运算实际上是通过适当的乘法矩阵进行的矩阵乘法。



现在以您的示例为例,您实际上编写了一个矩阵转置函数。矩阵转置可以就地完成,因为您要交换数字对,但是我怀疑任何编译器都可以自动检测到这一点并为您优化它。确实,有许多技巧可以优化矩阵转置,以便对缓存友好,从而获得巨大的性能提升。尽管如此,通过简单的实现,您几乎肯定会得到与Aditya Kumar在他的答案中所描述的非常相似的东西。






<正如我之前使用朴素一词所预示的那样,程序员可以通过高级模板和其他元编程技术,以极其优化的方式诱使编译器内联很多东西。 (至少在C ++中,也许在其他允许您重载 operator = 的语言中。)对于有兴趣进行此方法的案例研究的所有人,请采取看看本征矩阵库,以及它如何处理<$ c之类的简单操作$ c> u = v + w; 其中三个变量都是浮点矩阵。以下是关键点的简要概述。



天真的实现会使 operator + 重载以返回临时值,而 operator = 将该临时副本复制到结果中。当然,在C ++ 11中,通过move构造函数避免在赋值过程中复制最终副本非常容易,但是如果您在右侧有多个运算符(例如 u = 3.15f * u.transposed()+ 5.0f; ,因为每个运算符/方法都将返回一个临时变量,并且该临时变量必须循环遍历以便处理



长话短说,Eigen所做的不是在发生相应的函数调用时执行每个操作,而是返回模板化的某种函子仅描述需要执行的操作,而所有实际工作最终都在 operator = 中发生,从而使编译器能够运行发出一个单一的内联循环,仅遍历数据一次并真正就地进行操作。


Say I have an object of type A. Consider this case for any function of the type A -> A (i.e. takes object of type A and returns another object of type A):

foo = func(foo)

Here, the simplest case would be to for the result of func(foo) to be copied into foo. Is it possible to optimize this so that:

  • foo gets modified inplace in func

There are no constraints on the language used. What I want to know is what constraints and properties the language must have to enable such an optimization. Are there any existing languages which perform such an optimization?

Example(in pseudo code):

type Matrix = List<List<int>>

Matrix rotate90Deg(Matrix x):
   Matrix result(x.columns, x.rows) #Assume it has a constructor which takes as args the num of rows, and num of cols.
   for (int i = 0; i < x.rows; i++):
       for (int j = 0; j < x.columns; j++):
           result[i][j] = x[j][i]
   return result

Matrix a = [[1,2,3],[4,5,6],[7,8,9]]
a = rotate90Deg(a)

Here, is it possible to optimize the code so that it doesn't allocate memory for a new matrix(result), and instead just modifies the original matrix passed.

解决方案

First of all, you have to realize that some operations are inherently not possible to be computed in-place. Matrix-matrix multiplication is an example of this, and rotate90Deg would fall under this category since such an operation is actually a matrix multiplication by the appropriate multiplication matrix.

Now as for your example, you actually coded up a matrix transpose function. Matrix transpose can be done in-place since you are swapping pairs of numbers, but I doubt that any compilers can automatically detect this and optimize it for you. Indeed, there are many, many tricks that one can do to optimize matrix transpose in order to be cache-friendly in order to gain huge performance increases. Nevertheless, with an naive implementation, you will almost certainly end up with something very similar to what Aditya Kumar describes in his answer.


As I have foreshadowed by using the word "naive" earlier, programmers can coax the compiler to inline lots and lots of things in extremely optimized ways through advanced templating and other meta-programming techniques. (At least in C++, and maybe other languages that allow you to overload operator =.) For anyone interested in a case study of how this is done and what is involved, take a look at the Eigen matrix library, and how it handles a simple operation like u = v + w; where the three variables are all matrices of floats. Following is a brief overview of the key points.

A naive implementation would overload operator+ to return a temporary and operator= to copy that temporary to the result. Of course, in C++11 it is pretty easy to avoid the final copy during assignment by way of move constructors, but you will still have unnecessary temporaries if you had something a little more complex with multiple operators on the right hand side like u = 3.15f * u.transposed() + 5.0f; since each operator/method would return a temporary, and that temporary would have to be looped over in order to process the next operator.

Long story short, what Eigen does is rather than perform each operation when the corresponding function call occurs, the calls return a templated functor of sorts which merely describes the operation that needs to take place, and all the actual work ends up happening in operator =, thus enabling the compiler to emit a single, inlined loop for traversing the data only once and doing the operation truly in-place.

这篇关于是否有可能优化a = func(a)的编译器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆