表达模板:提高评估表达式的表现? [英] Expression templates: improving performance in evaluating expressions?

查看:150
本文介绍了表达模板:提高评估表达式的表现?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

通过表达式模板技术,矩阵表达式如

  D = A * B + sin(C)+3。 ; 

在计算性能方面几乎相当于手写的 循环



现在,假设我有以下两个表达式

  D = A * B + sin(C)+3; 
F = D * E;
cout<<< F < \\\
;

在表达式模板的古典实现中,计算性能将与两个循环。这是因为在遇到 = 运算符之后立即评估表达式。



我的问题是:有没有技术(例如,使用占位符?)来认识到 D 的值实际上是未使用的,感兴趣的值是 F的唯一元素,只有表达式

  F = E *(A * B + sin(C) 3); 

被评估,整个性能等同于 / code>循环?



当然,这样一种假设的技术也应该能够返回来评估表达式

  D = A * B + sin(C)+3; 

如果代码后面的值 D 需要。



提前感谢您的帮助。



编辑: Evgeny提出的解决方案



原始说明:

 结果D = A * B-sin(C)+3。 

计算时间:32ms



两步说明:

  Result Intermediate = A * B; 
结果D =中级-sin(C)+3;

计算时间:43ms



auto

  auto&&中间体= A * B; 
结果D =中级-sin(C)+3;

计算时间:32ms。



结论, auto&& 启用恢复单一指令案例的原始计算时间。



编辑:总结相关链接,遵循Evgeny的建议



复制Elision



自动告诉我们什么



C ++ 11中的通用引用



C ++ Rvalue参考资料解释



C ++ and Be yond 2012:Scott Meyers - C ++ 11中的通用引用

解决方案

评估表达式模板通常在将结果保存到某些特殊类型时发生,如:

$结果D = A * B + sin(C)+3; b
$ b

  

表达式的结果类型:

  A * B + SIN(C)3。 

不是结果,而是可以转换为结果即可。在这种转换中,评估发生。







我的问题是:是否有任何技术例如,使用占位符?)认识到D的值实际上没有使用


这种转载:

 结果D = A * B + sin(C)+3; 
结果F = D * E;

 结果F =(A * B + sin(C)+3。)* E; 

当您不评估D时可能有可能。为此,通常您应该捕获D,因为它是真实的,表达式类型。例如,借助 自动

  auto&& D = A * B + sin(C)+3; 
结果F = D * E;






但是,您应该小心 - 有时表情模板捕获对它的操作数的引用,如果你有一些 rvalue ,它将在表达后到期:

  auto&& D = A * get_large_rvalue(); 
//此时,** get_large_rvalue **的结果被破坏
// D已到期参考
结果F = D * E;

其中 get_large_rvalue 是:

  LargeMatrix get_large_rvalue(); 

结果是 rvalue ,在完整表达式结束时到期 get_large_rvalue 已被调用。如果表达式中的某些内容将存储指针/引用(以供评估),并且您将推迟评估 - 指针/引用将超出指向/引用的对象。



为了防止这种情况,你应该做:

  auto&& middle = get_large_rvalue(); //它会活到范围的结束
auto&& D = A *中间;
结果F = D * E;







我是不熟悉C ++ 11,但是据我所知,自动请求编译器从初始化确定变量的类型


对,就是这样。这被称为类型推断/扣除



$ b C ++ 98/03仅适用于模板功能, $ b

你知道CUDA和C ++ 11如何互相交流?


我没有使用 CUDA (虽然我使用了 OpenCL ),但我猜,使用C ++ 11的主机代码将不会有任何问题。可能在设备代码中不支持某些C ++ 11功能,但为了您的目的,您只需在主机代码

自动 >


最后,只有C ++有可能吗?




<你的意思是C ++ 11之前?即C ++ 98 / C ++ 03?
是的,这是可能的,但它有更多的语法噪音,也许这是拒绝它的理由:

  // somehwhere 
{
use_D(A * B + sin(C)+3。);
}
// ...
模板< typename表达式>
void use_D(Expression D)//根据你的表达式模板库
//最好使用(const Expression& e)
{
结果F = D * E;
}







我现在在Windows下使用CUDA / Visual Studio 2010。可以请您推荐一个编译器/工具集/环境,以便OS在我感兴趣的框架中使用C ++ 11(GPGPU和CUDA,您知道)


MSVC 2010支持C ++ 11的某些部分。特别是它支持 auto 。所以,如果您只需要C ++ 11中的 auto - MSVC2010就可以了。



但是如果您可以使用MSVC2012,我会建议坚持使用它 - 它具有更好的C ++ 11支持。


另外,技巧auto&& intermediate = get_large_rvalue() ;似乎对第三方用户(不应该知道这样的问题)不是透明的。我对吗?任何替代方案?


如果表达式模板存储对某些值的引用,并且推迟它的评估。你应该确定所有的参考文献都在评估的地方活着。使用任何你想要的方法 - 它可以没有自动完成,如:

  LargeMatrix temp = get_large_rvalue(); 

或者甚至可能是全局/静态变量(较不优先的方法)。


最后一个评论/问题:使用自动&& D = A * B + sin(C)+3;似乎我应该重载operator =对于两个表达式之间的分配,对吗?


不,这样的表单不要求也不要复制/移动赋值操作符,也可以复制/移动构造函数



基本上它只是命名临时值,并将其使用寿命延长到范围的末尾。 查看此SO



但是,如果你使用另一种形式:

  auto D = A * B + sin(C)+3; 

在这种情况下,为了编译可能需要复制/移动/转换构造函数(尽管实际的副本可以是通过使用复制Ellision ,由编译器进行优化。)


此外,使用auto(对于中间表达式)和Result强制计算之间的切换似乎对第三方用户来说是不透明的。任何替代方案?


我不知道是否有其他选择。这是表达模板的性质。当您在表达式中使用它们时 - 它们返回一些内部中间类型,但是当您存储到某些特殊类型时,会触发评估。


By the expression templates technique, a matrix expression like

D = A*B+sin(C)+3.;

is pretty much equivalent, in terms of computing performance, to a hand-written for loop.

Now, suppose that I have the following two expressions

D = A*B+sin(C)+3.;
F = D*E;
cout << F << "\n";

In a "classical" implementation by expression templates, the computing performance will be pretty much the same as that of two for loops in sequence. This is because the expressions are evaluated immediately after the = operators are encountered.

My question is: is there any technique (for example, using placeholders?) to recognize that the values of D are actually unused and that the values of interest are the sole elements of F, so that only the expression

F = E*(A*B+sin(C)+3.);

is evaluated and the whole performance is equivalent to that of a single for loop?

Of course, such an hypothetical technique should also be able to return back to evaluate the expression

D = A*B+sin(C)+3.;

if later in the code the values of D are needed.

Thank you in advance for any help.

EDIT: Results experimenting the solution suggested by Evgeny

Original instruction:

Result D=A*B-sin(C)+3.;

Computing time: 32ms

Two steps instruction:

Result Intermediate=A*B;
Result D=Intermediate-sin(C)+3.;

Computing time: 43ms

Solution with auto:

auto&& Intermediate=A*B;
Result D=Intermediate-sin(C)+3.;

Computing time: 32ms.

In conclusion, auto&& enabled to restore the original computing time of the single instruction case.

EDIT: Summarizing relevant links, following the suggestions by Evgeny

Copy Elision

What does auto tell us

Universal References in C++11

C++ Rvalue References Explained

C++ and Beyond 2012: Scott Meyers - Universal References in C++11

解决方案

Evaluation of expression template typically happens when you save result to some special type like:

Result D = A*B+sin(C)+3.;

Result type of expression:

A*B+sin(C)+3.

is not Result, but it is something that convertable to Result. And evaluation happens during such conversion.


My question is: is there any technique (for example, using placeholders?) to recognize that the values of D are actually unused

Such kind of "transfromation":

Result D = A*B+sin(C)+3.;
Result F = D*E;

to

Result F = (A*B+sin(C)+3.)*E;

Is possible when you do not evaluate D. To do this, typically you should capture D as it's real , expression type. For instance, with help of auto:

auto &&D = A*B+sin(C)+3.;
Result F = D*E;


However, you should be carefull - sometimes expression template captures references to it's operands, and if you have some rvalue which would expire after it's expression:

auto &&D = A*get_large_rvalue();
// At this point, result of **get_large_rvalue** is destructed
// And D has expiried reference
Result F = D*E;

Where get_large_rvalue is:

LargeMatrix get_large_rvalue();

It's result is rvalue, it expiries at the end of full expression when get_large_rvalue was called. If something within expression would store pointer/reference to it (for later evaluation) and you would "defer" evaluation - pointer/reference will outlive pointed/referenced object.

In order to prevent this, you should do:

auto &&intermediate = get_large_rvalue(); // it would live till the end of scope
auto &&D = A*intermediate ;
Result F = D*E;


I'm not familiar with C++11 but, as I understand, auto asks the compiler to determine the type of a variable from its initialization

Yes, exactly. This is called Type Inference/Deduction.

C++98/03 had type deduction only for template functions, in C++11 there is auto.

Do you know how do CUDA and C++11 interact each other?

I haven't used CUDA (though I used OpenCL), but I guess that there will be no any problems in Host code with C++11. Maybe some C++11 features are not supported within Device code, but for your purpose - you need auto only in Host code

Finally, is there any possibility with only C++?

Do you mean pre-C++11? I.e. C++98/C++03? Yes, it is possible, but it has more syntax noise, maybe that would be reason to reject it:

// somehwhere
{
    use_D(A*B+sin(C)+3.);
}
// ...
template<typename Expression>
void use_D(Expression D) // depending on your expression template library
                         //   it may be better to use (const Expression &e)
{
    Result F = D*E;
}


I'm now using CUDA/Visual Studio 2010 under Windows. Could you please recommend a compiler/toolset/environment for both OS' to use C++11 in the framework of my interest (GPGPU and CUDA, in you know any)

MSVC 2010 does supports some parts of C++11. In particular it supports auto. So, if you need only auto from C++11 - MSVC2010 is OK.

But if you may use MSVC2012 - I would recommed to stick with it - it has much better C++11 support.

Also, the trick auto &&intermediate = get_large_rvalue(); seems to be not "transparent" to a third party user (which is not supposed to know such an issue). Am I right? Any alternative?

If expression template stores references to some values, and you defer it's evaluation. You should be sure that all it's references are alive at the place of evaluation. Use any method which you want - it can be done without auto, like:

LargeMatrix temp = get_large_rvalue();

Or maybe even global/static variable (less prefered method).

A last comment/question: to use auto &&D = A*B+sin(C)+3.; it seems that I should overload the operator= for assignments between two expressions, right?

No, such form does not requires nor copy/move assignment operator nor copy/move constructor.

Basically it just names temporary value, and prolongs it's lifetime to the end of scope. Check this SO.

But, if you would use another form:

auto D = A*B+sin(C)+3.;

In such case copy/move/conversion constructor maybe required in order to compile (though actual copy can be optimized away by compiler by use of Copy Ellision)

Also, switching between using auto (for the intermediate expressions) and Result to force calculation seems to be non-transparent to a third party user. Any alternative?

I am not sure if there is any alternative. This is by nature of expression templates. While you using them in expressions - they return some internal intermediate types, but when you store to some "special" type - evaluation is triggered.

这篇关于表达模板:提高评估表达式的表现?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆