在结构中存储指针时,Visual Studio 2015 循环性能下降 [英] Visual Studio 2015 loop performance degradation when storing pointers in struct

查看:21
本文介绍了在结构中存储指针时,Visual Studio 2015 循环性能下降的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现以下循环

std::vector<int> x,y,z;
...
unsigned int res=0;
auto p=x.data();
auto q=y.data();
auto r=z.data();
for(unsigned int i=0;i<N;++i){
  res+=*p++ +*q++ +*r++;
}

运行速度几乎是这个的两倍,它只是将指针打包在一个结构中:

runs almost twice as fast as this one, which merely packs the pointers in a struct:

struct pdata{int *px,*py,*pz;};

unsigned int res=0;
pdata d{x.data(),y.data(),z.data()};
for(unsigned int i=0;i<N;++i){
  res+=*d.px++ +*d.py++ +*d.pz++;
}
return res;

这是一个已知的性能问题吗?请在下面找到 Visual C++ 2015 在 32 位 (x86) 发布模式(默认设置)、Windows 7 64 位、英特尔酷睿 i5-2520M @2.5GHz 的完整程序和性能测量:

Is this a known performance issue? Please find below complete program and performance measurements for Visual C++ 2015 in 32-bit (x86) release mode (default settings), Windows 7 64-bit, Intel Core i5-2520M @2.5GHz:

#include <algorithm>
#include <array>
#include <chrono>
#include <cmath>
#include <numeric> 

std::chrono::high_resolution_clock::time_point measure_start,measure_pause;

template<typename F>
double measure(F f)
{
  using namespace std::chrono;

  static const int              num_trials=10;
  static const milliseconds     min_time_per_trial(200);
  std::array<double,num_trials> trials;
  volatile decltype(f())        res; /* to avoid optimizing f() away */

  for(int i=0;i<num_trials;++i){
    int                               runs=0;
    high_resolution_clock::time_point t2;

    measure_start=high_resolution_clock::now();
    do{
      res=f();
      ++runs;
      t2=high_resolution_clock::now();
    }while(t2-measure_start<min_time_per_trial);
    trials[i]=duration_cast<duration<double>>(t2-measure_start).count()/runs;
  }
  (void)res; /* var not used warn */

  std::sort(trials.begin(),trials.end());
  return std::accumulate(
    trials.begin()+2,trials.end()-2,0.0)/(trials.size()-4);
}

template<typename F>
double measure(unsigned int n,F f)
{
  double t=measure(f);
  return (t/n)*10E9;
}    

#include <iostream>
#include <vector>

int main()
{
  static const unsigned int N=100000;
  std::vector<int> x(N),y(N),z(N);

  for(int i=0;i<N;i++){
    x[i]=i;
    y[i]=i+1;
    z[i]=i+2;
  }

  std::cout<<measure(N,[&]{
    unsigned int res=0;
    auto p=x.data();
    auto q=y.data();
    auto r=z.data();
    for(unsigned int i=0;i<N;++i){
      res+=*p++ +*q++ +*r++;
    }
    return res;
  })<<",";

  std::cout<<measure(N,[&]{
    struct pdata{int *px,*py,*pz;};

    unsigned int res=0;
    pdata d{x.data(),y.data(),z.data()};
    for(unsigned int i=0;i<N;++i){
      res+=*d.px++ +*d.py++ +*d.pz++;
    }
    return res;
  })<<"\n";
}

输出

4.24541,7.44588

谢谢,

推荐答案

正如上面的海报所指出的,编译器在第一种情况下做了一些向量化.可能无法证明,或者根本没有试图证明 std::vector 的指针不能与堆栈上的结构本身重叠.

As the poster has pointed out above, the compiler has done some vectorization in the first case. Likely it is having trouble proving, or perhaps didn't try to prove at all, that the pointers for the std::vector cannot overlap the struct itself on the stack.

您可以尝试使用 restrict 限定符标记指针,尽管我不确定您的目标环境是否支持它.

You can try labeling the pointers with the restrict qualifier, though I am not sure it is supported in your target environment.

这篇关于在结构中存储指针时,Visual Studio 2015 循环性能下降的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆