英特尔TBB的scalable_allocator如何工作? [英] How does Intel TBB's scalable_allocator work?
问题描述
在英特尔®线程构建模块中, tbb :: scalable_allocator
实际上是做什么的?
What does the tbb::scalable_allocator
in Intel Threading Building Blocks actually do under the hood ?
肯定是有效的。我刚刚使用它通过更改单个 std:25%的应用程序执行时间(并且看到CPU利用率从4%系统的从200%增加到350%)。 :vector< T>
到 std :: vector< T,tbb :: scalable_allocator< T> >
。另一方面,在另一个应用程序中,我已经看到它翻倍了一个已经很大的内存消耗,并发送东西到交换城市。
It can certainly be effective. I've just used it to take 25% off an apps' execution time (and see an increase in CPU utilization from ~200% to 350% on a 4-core system) by changing a single std::vector<T>
to std::vector<T,tbb::scalable_allocator<T> >
. On the other hand in another app I've seen it double an already large memory consumption and send things to swap city.
英特尔自己的文档不会给予很多(例如本常见问题结尾处的简短部分)。任何人都可以告诉我在我自己去探索代码之前使用了什么技巧?
Intel's own documentation doesn't give a lot away (e.g a short section at the end of this FAQ). Can anyone tell me what tricks it uses before I go and dig into its code myself ?
UPDATE :第一次使用TBB 3.0 ,并且看到了我从scalable_allocator的最好的加速。将单个向量< int>
更改为向量< int,scalable_allocator< int> >
将事物的运行时间从85秒减少到35秒(Debian Lenny,Core2,从测试中得到TBB 3.0)。
UPDATE: Just using TBB 3.0 for the first time, and seen my best speedup from scalable_allocator yet. Changing a single vector<int>
to a vector<int,scalable_allocator<int> >
reduced the runtime of something from 85s to 35s (Debian Lenny, Core2, with TBB 3.0 from testing).
推荐答案
There is a good paper on the allocator: The Foundations for Scalable Multi-core Software in Intel Threading Building Blocks
我有限的经验:我重载了全局新/删除与我的AI应用程序的tbb :: scalable_allocator。但在时间剖面中没有什么变化。我没有比较内存使用。
My limited experience: I overloaded the global new/delete with the tbb::scalable_allocator for my AI application. But there was little change in the time profile. I didn't compare the memory usage though.
这篇关于英特尔TBB的scalable_allocator如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!