C++嵌套循环性能 [英] c++ nested loop performance

查看:63
本文介绍了C++嵌套循环性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我基本上有两个向量,一个用于大量元素,另一个用于少量探针,用于对元素数据进行采样.我偶然发现了要执行两个循环的顺序的问题.自然,我认为在较大的向量上使用外部循环会很有益

实现1:

  for(auto& elem:elements){for(自动和探测:探测器){probe.insertParticleData(elem);}} 

但是,第二种实现似乎只花费一半的时间

实现2:

  for(自动和探针:探针){for(auto& elem:elements){probe.insertParticleData(elem);}} 

这可能是什么原因?

定时是通过以下代码生成的

  clock_t t_begin_ps = std :: clock();...//定时代码clock_t t_end_ps = std :: clock();double elapsed_secs_ps = double(t_end_ps-t_begin_ps)/CLOCKS_PER_SEC; 

在插入元素数据时,我基本上做了两件事,测试到探针的距离是否在限制范围内,并计算平均值

  probe :: insertParticleData(const elem& pP){if (!isInside(pP.position())) {return false;}...//计算alpha和betaavg_vel = alpha * avg_vel + beta * pP.getVel();返回true;} 

要了解内存使用情况,我大约有.10k元素是具有30个double数据成员的对象.为了进行测试,我使用了10个包含15个双精度探针.

解决方案

当今,CPU进行了大量优化,可线性访问内存.因此,一些长循环将击败许多短循环.您希望内部循环在长向量上进行迭代.

I have basically two vectors one for a large number of elements and a second for a small number of probes used to sample data of the elements. I stumbled upon the question in which order to implement the two loops. Naturally I thought having the outer loop over the larger vector would be beneficially

Implementation 1:

for(auto& elem: elements) {
    for(auto& probe: probes) {
        probe.insertParticleData(elem);
    }
}

However it seems that the second implementation takes only half of the time

Implementation 2:

for(auto& probe: probes) {
    for(auto& elem: elements) {
        probe.insertParticleData(elem);
    }
}

What could be the reason for that?

Edit:

Timings were generated by the following code

clock_t t_begin_ps = std::clock();
... // timed code
clock_t t_end_ps = std::clock();
double elapsed_secs_ps = double(t_end_ps - t_begin_ps) / CLOCKS_PER_SEC;

and on inserting the elements data I do basically two things, testing if the distance to the probe is below a limit and the computing an average

probe::insertParticleData (const elem& pP) {
   if (!isInside(pP.position())) {return false;}
   ... // compute alpha and beta
   avg_vel = alpha*avg_vel + beta*pP.getVel();
   return true;
}

To get an idea of the memory usage I have approx. 10k elements which are objects with 30 double data members. For the test I used 10 probes containing 15 doubles.

解决方案

Todays CPUs are heavily optimized for linear access to memory. Therefore a few long loops will beat many short loops. You want the inner loop to iterate over the long vector.

这篇关于C++嵌套循环性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆