C++嵌套循环性能 [英] c++ nested loop performance

查看：63 发布时间：2021/5/30 21:20:20 c++ performance loops

本文介绍了C++嵌套循环性能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我基本上有两个向量，一个用于大量元素，另一个用于少量探针，用于对元素数据进行采样.我偶然发现了要执行两个循环的顺序的问题.自然，我认为在较大的向量上使用外部循环会很有益

实现1:

  for(auto& elem:elements){for(自动和探测:探测器){probe.insertParticleData(elem);}}

但是，第二种实现似乎只花费一半的时间

实现2:

  for(自动和探针:探针){for(auto& elem:elements){probe.insertParticleData(elem);}}

这可能是什么原因?

定时是通过以下代码生成的

  clock_t t_begin_ps = std :: clock();...//定时代码clock_t t_end_ps = std :: clock();double elapsed_secs_ps = double(t_end_ps-t_begin_ps)/CLOCKS_PER_SEC;

在插入元素数据时，我基本上做了两件事，测试到探针的距离是否在限制范围内，并计算平均值

  probe :: insertParticleData(const elem& pP){if (!isInside(pP.position())) {return false;}...//计算alpha和betaavg_vel = alpha * avg_vel + beta * pP.getVel();返回true；}

要了解内存使用情况，我大约有.10k元素是具有30个double数据成员的对象.为了进行测试，我使用了10个包含15个双精度探针.

解决方案

当今，CPU进行了大量优化，可线性访问内存.因此，一些长循环将击败许多短循环.您希望内部循环在长向量上进行迭代.

I have basically two vectors one for a large number of elements and a second for a small number of probes used to sample data of the elements. I stumbled upon the question in which order to implement the two loops. Naturally I thought having the outer loop over the larger vector would be beneficially

Implementation 1:

for(auto& elem: elements) {
    for(auto& probe: probes) {
        probe.insertParticleData(elem);
    }
}

However it seems that the second implementation takes only half of the time

Implementation 2:

for(auto& probe: probes) {
    for(auto& elem: elements) {
        probe.insertParticleData(elem);
    }
}

What could be the reason for that?

Edit:

Timings were generated by the following code

clock_t t_begin_ps = std::clock();
... // timed code
clock_t t_end_ps = std::clock();
double elapsed_secs_ps = double(t_end_ps - t_begin_ps) / CLOCKS_PER_SEC;

and on inserting the elements data I do basically two things, testing if the distance to the probe is below a limit and the computing an average

probe::insertParticleData (const elem& pP) {
   if (!isInside(pP.position())) {return false;}
   ... // compute alpha and beta
   avg_vel = alpha*avg_vel + beta*pP.getVel();
   return true;
}

To get an idea of the memory usage I have approx. 10k elements which are objects with 30 double data members. For the test I used 10 probes containing 15 doubles.

解决方案

Todays CPUs are heavily optimized for linear access to memory. Therefore a few long loops will beat many short loops. You want the inner loop to iterate over the long vector.

这篇关于C++嵌套循环性能的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

C++嵌套循环性能 [英] c++ nested loop performance

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

C++嵌套循环性能 [英] c++ nested loop performance

问题描述

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭