C++嵌套循环性能 [英] c++ nested loop performance
问题描述
实现1:
for(auto& elem:elements){for(自动和探测:探测器){probe.insertParticleData(elem);}}
但是,第二种实现似乎只花费一半的时间
实现2:
for(自动和探针:探针){for(auto& elem:elements){probe.insertParticleData(elem);}}
这可能是什么原因?
定时是通过以下代码生成的
clock_t t_begin_ps = std :: clock();...//定时代码clock_t t_end_ps = std :: clock();double elapsed_secs_ps = double(t_end_ps-t_begin_ps)/CLOCKS_PER_SEC;
在插入元素数据时,我基本上做了两件事,测试到探针的距离是否在限制范围内,并计算平均值
probe :: insertParticleData(const elem& pP){if (!isInside(pP.position())) {return false;}...//计算alpha和betaavg_vel = alpha * avg_vel + beta * pP.getVel();返回true;}
要了解内存使用情况,我大约有.10k元素是具有30个double数据成员的对象.为了进行测试,我使用了10个包含15个双精度探针.
当今,CPU进行了大量优化,可线性访问内存.因此,一些长循环将击败许多短循环.您希望内部循环在长向量上进行迭代.
I have basically two vectors one for a large number of elements and a second for a small number of probes used to sample data of the elements. I stumbled upon the question in which order to implement the two loops. Naturally I thought having the outer loop over the larger vector would be beneficially
Implementation 1:
for(auto& elem: elements) {
for(auto& probe: probes) {
probe.insertParticleData(elem);
}
}
However it seems that the second implementation takes only half of the time
Implementation 2:
for(auto& probe: probes) {
for(auto& elem: elements) {
probe.insertParticleData(elem);
}
}
What could be the reason for that?
Edit:
Timings were generated by the following code
clock_t t_begin_ps = std::clock();
... // timed code
clock_t t_end_ps = std::clock();
double elapsed_secs_ps = double(t_end_ps - t_begin_ps) / CLOCKS_PER_SEC;
and on inserting the elements data I do basically two things, testing if the distance to the probe is below a limit and the computing an average
probe::insertParticleData (const elem& pP) {
if (!isInside(pP.position())) {return false;}
... // compute alpha and beta
avg_vel = alpha*avg_vel + beta*pP.getVel();
return true;
}
To get an idea of the memory usage I have approx. 10k elements which are objects with 30 double data members. For the test I used 10 probes containing 15 doubles.
Todays CPUs are heavily optimized for linear access to memory. Therefore a few long loops will beat many short loops. You want the inner loop to iterate over the long vector.
这篇关于C++嵌套循环性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!