为什么 valarray 在 Visual Studio 2015 上这么慢? [英] Why is valarray so slow on Visual Studio 2015?

查看:48
本文介绍了为什么 valarray 在 Visual Studio 2015 上这么慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了加快库中的计算速度,我决定使用 std::valarray 类.文档 说:

To speed up the calculations in my library, I decided to use the std::valarray class. The documentation says:

std::valarray 和 helper 类被定义为不包含某些别名的形式,从而允许对这些类的操作优化类似于C中关键字restrict的效果编程语言.此外,函数和运算符允许 valarray 参数返回代理对象以使其成为编译器可以优化表达式,例如 v1 = a * v2+ v3;作为执行 v1[i] = a * v2[i] + v3[i] 的单个循环;避免任何临时或多次通过.

std::valarray and helper classes are defined to be free of certain forms of aliasing, thus allowing operations on these classes to be optimized similar to the effect of the keyword restrict in the C programming language. In addition, functions and operators that take valarray arguments are allowed to return proxy objects to make it possible for the compiler to optimize an expression such as v1 = a * v2 + v3; as a single loop that executes v1[i] = a * v2[i] + v3[i]; avoiding any temporaries or multiple passes.

这正是我需要的.当我使用 g++ 编译器时,它按照文档中的描述工作.我开发了一个简单的例子来测试 std::valarray 的性能:

This is exactly what I need. And it works as described in the documentation when I use the g++ compiler. I have developed a simple example to test the std::valarray performance:

void check(std::valarray<float>& a)
{
   for (int i = 0; i < a.size(); i++)
      if (a[i] != 7)
         std::cout << "Error" << std::endl;
}

int main()
{
   const int N = 100000000;
   std::valarray<float> a(1, N);
   std::valarray<float> c(2, N);
   std::valarray<float> b(3, N);
   std::valarray<float> d(N);

   auto start = std::chrono::system_clock::now();
   d = a + b * c;
   auto end = std::chrono::system_clock::now();

   std::cout << "Valarr optimized case: "
      << (end - start).count() << std::endl;

   check(d);

   // Optimal single loop case
   start = std::chrono::system_clock::now();
   for (int i = 0; i < N; i++)
      d[i] = a[i] + b[i] * c[i];
   end = std::chrono::system_clock::now();
   std::cout << "Optimal case: " << (end - start).count() << std::endl;

   check(d);
   return 0;
}

在 g++ 上我得到了:

On g++ I got:

Valarr optimized case: 1484215
Optimal case: 1472202

看起来所有操作d = a + b * c;真的都放在一个循环中,在保持性能的同时简化了代码.但是,当我使用 Visual Studio 2015 时,这不起作用.对于相同的代码,我得到:

It seems that all operations d = a + b * c; are really placed in one cycle, which simplifies the code while maintaining performance. However, this does not work when I use Visual Studio 2015. For the same code, I get:

Valarr optimized case: 6652402
Optimal case: 1766699

相差近四倍;没有优化!为什么 std::valarray 在 Visual Studio 2015 上不能按需要工作?我做的一切都对吗?如何在不放弃std::valarray的情况下解决问题?

The difference is almost four times; there is no optimization! Why is std::valarray not working as needed on Visual Studio 2015? Am I doing everything right? How can I solve the problem without abandoning std::valarray?

推荐答案

我做对了吗?

你做的一切都是对的.问题出在 Visual Studio std::valarray 实现中.

You're doing everything right. The problem is in the Visual Studio std::valarray implementation.

为什么 std::valarray 在 Visual Studio 2015 上无法按需要工作?

Why is std::valarray not working as needed on Visual Studio 2015?

只需打开任何valarray 运算符的实现,例如operator+.您将看到类似(宏展开后)的内容:

Just open the implementation of any valarray operator, for example operator+. You will see something like (after macro expansion):

   template<class _Ty> inline
      valarray<_Ty> operator+(const valarray<_Ty>& _Left,
         const valarray<_Ty>& _Right)
   {
      valarray<TYPE> _Ans(_Left.size());
      for (size_t _Idx = 0; _Idx < _Ans.size(); ++_Idx)
         _Ans[_Idx] = _Left[_Idx] + _Right[_Idx];
      return (_Ans)
   }

如您所见,创建了一个新对象,其中复制了操作的结果.确实没有优化.我不知道为什么,但这是事实.看起来在 Visual Studio 中,添加了 std::valarray 只是为了兼容性.

As you can see, a new object is created in which the result of the operation is copied. There really is no optimization. I do not know why, but it is a fact. It looks like in Visual Studio, std::valarray was added for compatibility only.

为了进行比较,请考虑 GNU 实现.如您所见,每个 _Expr 只包含 操作,但不包含数据.真正的计算在 赋值运算符,更具体地说是在 __valarray_copy 函数.因此,在您执行赋值之前,所有操作都在代理对象 _Expr 上执行.只有一次 operator= 被调用,存储在 _Expr 中的操作才会在一个循环中执行.这就是使用 g++ 获得如此好的结果的原因.

For comparison, consider the GNU implementation. As you can see, each operator returns the template class _Expr which contains only the operation, but does not contain data. The real computation is performed in the assignment operator and more specifically in the __valarray_copy function. Thus, until you perform assignment, all actions are performed on the proxy object _Expr. Only once operator= is called, is the operation stored in _Expr performed in a single loop. This is the reason why you get such good results with g++.

我该如何解决问题?

您需要在 Internet 上找到合适的 std::valarray 实现,或者您可以自己编写.您可以使用 GNU 实现作为示例.

You need to find a suitable std::valarray implementation on the internet or you can write your own. You can use the GNU implementation as an example.

这篇关于为什么 valarray 在 Visual Studio 2015 上这么慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆