STL矢量性能问题,包括基准 [英] Problem with STL vector peformance, benchmarks included
问题描述
我发现那篇旧帖子:
http://groups.google.com/group/comp=\"519204726d01e8
我刚刚删除了#include< kubux .....行。
******旧帖子为您的Convenince **** ****
你是对的:
#include< vector>
#include< iostream>
#include< ctime>
#include< memory>
#include< kubux / bits / allocator.cc>
#include< kubux / bits / new_delete_allocator.cc>
#include< kubux / bits / malloc_free_allocator.cc>
模板< typename T,typename Alloc = std :: allocator< T
class stupid {
public:
typedef Alloc allocator;
typedef typename allocator :: value_type value_type;
typedef typename allocator :: size_type size_type;
typedef typename allocator :: difference_type difference_type;
typedef typename allocator :: pointer pointer;
typedef typename allocator :: const_pointer const_pointer;
typedef typename allocator :: reference reference;
typedef typename allocator :: const_reference const_reference;
typedef指针迭代器;
typedef const_pointer
const_iterator;
typedef typename std :: reverse_iterator<迭代器>
reverse_iterator;
typedef typename std :: reverse_iterator< const_iterator>
const_reverse_iterator;
私人:
指针ptr;
size_type the_size;
public:
stupid(size_type length):
ptr(new T [length] ),
the_size(长度)
{
for(iterator iter = this-> ptr;
iter!= this-> ptr + the_size;
++ iter){
:: new(static_cast< void *>(iter))T();
}
}
~stupid(void){
iterator iter = ptr + the_size ;
while(iter ptr){
- iter;
iter-> ~T();
}
{
分配器分配;
alloc.deallocate(ptr,the_size);
}
the_size = 0;
}
引用运算符[](size_type index){
return (this-> ptr [index]);
}
const_reference operator [](size_type index)const {
return(this-> ptr [index]);
}
}; //愚蠢
int main(无效){
const unsigned long l = 50000000;
{
std :: vector< int v(l);
std :: clock_t loop_start = std :: clock();
for(unsigned long i = 0; i< l; ++ i){
v [i] = 5;
}
std :: clock_t loop_end = std :: clock();
std :: cout<< vector: << loop_end - loop_start<< std :: endl;
}
{
int * v = new int [l];
std :: fill_n(v,l,0);
std :: clock_t loop_start = std :: clock();
for(unsigned long i = 0; i< ; l; ++ i){
v [i] = 5;
}
std :: clock_t loop_end = std :: clock( );
std :: cout<< array: << loop_end - loop_start<< std :: endl;
}
{
stupid< int,std :: allocator< int v(l);
std :: clock_t loop_start = std :: clock();
for(unsigned long i = 0; i< l; ++ i){
v [i] = 5;
}
std :: clock_t loop_end = std :: clock();
std :: cout<< 愚蠢的: << loop_end - loop_start<< std :: endl;
}
{
std :: vector< intv(l);
std :: clock_t loop_start = std :: clock();
for(std :: vector< int> :: iterator i = v.begin();
i!= v.end(); ++ i){
* i = 5;
}
std :: clock_t loop_end = std :: clock();
std :: cout<< ptr: << loop_end - loop_start<< std :: endl;
}
{
int * v = new int [l];
std :: fill_n(v,l,0);
std :: clock_t loop_start = std :: clock();
for(int * i = v; i< ; v + l; ++ i){
* i = 5;
}
std :: clock_t loop_end = std :: clock ();
std :: cout<< ptr: << loop_end - loop_start<< std :: endl;
}
}
a.out
vector:320000
数组:320000
愚蠢:350000
iterator:340000
ptr:340000
再也没有惊喜了。
谢谢
Kai-Uwe Bux
************************************** ************ *
我在visual studio professional 2005上使用
标准STL实现运行报告的测试,应该由Dinkumware提供。
我的cpu是双核t2500,带有2gb ddr2。
我尝试了intel 9.1编译器和Microsoft编译器。 br />
在这两种情况下我都使用了O3优化,发布模式,并且使用了
英特尔,我还尝试了/ Qansi_alias / Qipo选项。
结果:
微软:
矢量:141
数组:94
愚蠢:93
ptr:172
ptr:78
英特尔:
矢量:312
数组:156 //如果我需要P4扩展,则变为45 ,其他价值
几乎保持不变
愚蠢:157
ptr:1047
ptr:156
我承认我对使用英特尔
编译器获得的结果感到非常失望。
路上是否有任何错误尝试进行了或者我发布的
源代码?
如果一切正确,我怎么能调查
问题在哪里?
干杯
StephQ
4月30日2007 05:48:31 -0700,StephQ写道:
>我使用
标准STL实现在visual studio professional 2005上运行报告的测试,应该由Dinkumware提供。
我的cpu是双核t2500,带有2gb ddr2。
我试用了intel 9.1编译器和Mi在这两种情况下我都使用了O3优化,发布模式,而且在使用英特尔的时候,我还尝试了/ Qansi_alias / Qipo选项。
你关闭了检查过的迭代器吗? (参见:
http://www.codeproject.com /vcpp/stl/...diterators.asp )
-
Roland Pibinger
最好的软件是简单,优雅,充满戏剧性的 - Grady Booch
你关闭了检查过的迭代器吗? (参见: http://www.codeproject.com/vcpp/stl /...diterators.asp )
感谢您提供非常有用的建议。我不知道检查
迭代器在默认情况下即使在vc8的发布模式下也会打开。
新结果(关闭已检查的迭代器) )是:
微软:
矢量:94
数组:94
愚蠢:94
ptr:141
ptr:96
Intel:
vector:141 >
数组:141 // 62如果我使用SSE2
愚蠢:141 // 62如果我启用SSE2并禁用异常处理
ptr:141
ptr:140
情况现在好多了。
看来Microsofr编译器的表现仍然好35%<除了vector iterator之外的所有情况都是
一个。
你还有其他任何建议吗?
我对低级指令一无所知,但如果我发布了
汇编程序 - 像代码在这里对你有什么帮助吗?
谢谢
干杯
StephQ >
4月30日下午4:32,StephQ< askmeo ... @ mailinator.comwrote:
< blockquote class =post_quotes>
你关闭了检查过的迭代器吗? (参见: http://www.codeproject.com/vcpp/stl /...diterators.asp )
感谢您提供非常有用的建议。我不知道检查
迭代器在默认情况下即使在vc8的发布模式下也会打开。
新结果(关闭已检查的迭代器) )是:
微软:
矢量:94
数组:94
愚蠢:94
ptr:141
ptr:96
Intel:
vector:141 >
数组:141 // 62如果我使用SSE2
愚蠢:141 // 62如果我启用SSE2并禁用异常处理
ptr:141
ptr:140
情况现在好多了。
看来Microsofr编译器的表现仍然好35%<除了vector iterator之外的所有情况都是
一个。
你还有其他任何建议吗?
我对低级指令一无所知,但如果我发布了
汇编程序 - 像代码在这里对你有什么帮助吗?
谢谢
干杯
StephQ
我回复自己只是为了告诉你我不介意调查任何这些问题。
我使用双打而不是int运行测试,结果非常类似于b $ b,微软编译器的性能提高了3%
性能。
然而,Stepanov抽象测试有利于英特尔编译器的优惠价格为
。
英特尔的抽象罚款:
0.85
0.68 with sse2
随微软:
1.11
好奇心。 ....如何获得抽象罚款
低于1?
Chhers
StephQ
I found that old post:
http://groups.google.com/group/comp....519204726d01e8
I just erased the #include <kubux.....lines.
****** old post for your convenince ********
You are right:
#include <vector>
#include <iostream>
#include <ctime>
#include <memory>
#include <kubux/bits/allocator.cc>
#include <kubux/bits/new_delete_allocator.cc>
#include <kubux/bits/malloc_free_allocator.cc>
template < typename T, typename Alloc = std::allocator<T
class stupid {
public:
typedef Alloc allocator;
typedef typename allocator::value_type value_type;
typedef typename allocator::size_type size_type;
typedef typename allocator::difference_type difference_type;
typedef typename allocator::pointer pointer;
typedef typename allocator::const_pointer const_pointer;
typedef typename allocator::reference reference;
typedef typename allocator::const_reference const_reference;
typedef pointer iterator;
typedef const_pointer
const_iterator;
typedef typename std::reverse_iterator< iterator >
reverse_iterator;
typedef typename std::reverse_iterator< const_iterator >
const_reverse_iterator;
private:
pointer ptr;
size_type the_size;
public:
stupid ( size_type length ) :
ptr ( new T [ length ] ),
the_size ( length )
{
for ( iterator iter = this->ptr;
iter != this->ptr + the_size;
++ iter ) {
::new( static_cast<void*>(iter) ) T();
}
}
~stupid ( void ) {
iterator iter = ptr + the_size;
while ( iter ptr ) {
-- iter;
iter->~T();
}
{
allocator alloc;
alloc.deallocate( ptr, the_size );
}
the_size = 0;
}
reference operator[] ( size_type index ) {
return( this->ptr[ index ] );
}
const_reference operator[] ( size_type index ) const {
return( this->ptr[ index ] );
}
}; // stupid
int main ( void ) {
const unsigned long l = 50000000;
{
std::vector< int v ( l );
std::clock_t loop_start = std::clock();
for ( unsigned long i = 0; i < l; ++i ) {
v[i] = 5;
}
std::clock_t loop_end = std::clock();
std::cout << "vector: " << loop_end - loop_start << std::endl;
}
{
int* v = new int [ l ];
std::fill_n(v, l, 0);
std::clock_t loop_start = std::clock();
for ( unsigned long i = 0; i < l; ++i ) {
v[i] = 5;
}
std::clock_t loop_end = std::clock();
std::cout << "array: " << loop_end - loop_start << std::endl;
}
{
stupid< int, std::allocator<int v ( l );
std::clock_t loop_start = std::clock();
for ( unsigned long i = 0; i < l; ++i ) {
v[i] = 5;
}
std::clock_t loop_end = std::clock();
std::cout << "stupid: " << loop_end - loop_start << std::endl;
}
{
std::vector<intv ( l );
std::clock_t loop_start = std::clock();
for ( std::vector<int>::iterator i = v.begin();
i != v.end(); ++i ) {
*i = 5;
}
std::clock_t loop_end = std::clock();
std::cout << "ptr: " << loop_end - loop_start << std::endl;
}
{
int* v = new int [ l ];
std::fill_n(v, l, 0);
std::clock_t loop_start = std::clock();
for ( int* i = v; i < v+l; ++i ) {
*i = 5;
}
std::clock_t loop_end = std::clock();
std::cout << "ptr: " << loop_end - loop_start << std::endl;
}
}
a.out
vector: 320000
array: 320000
stupid: 350000
iterator: 340000
ptr: 340000
No surprises anymore.
Thanks
Kai-Uwe Bux
************************************************** *
I ran the reported test on visual studio professional 2005 with its
standard STL implementation, which should be supplyed by Dinkumware.
My cpu is a dual core t2500 with 2gb ddr2.
I tryed both the intel 9.1 compiler and the Microsoft one.
In both cases I used the O3 optimizations, release mode, and with the
Intel one I also tryed the /Qansi_alias /Qipo options.
Results:
Microsoft:
vector: 141
array: 94
stupid: 93
ptr: 172
ptr: 78
Intel:
vector: 312
array: 156 // becomes 45 if I require P4 extensions, other values
remains nearly the same
stupid: 157
ptr: 1047
ptr: 156
I admit I''m quite disappointed wit the reults obtained with the Intel
compiler.
Is there any fault in the way the tast was conducted or with the
source code I posted?
If everything is correct, how could I investigate where is the
problem?
Cheers
StephQ
On 30 Apr 2007 05:48:31 -0700, StephQ wrote:>I ran the reported test on visual studio professional 2005 with its
standard STL implementation, which should be supplyed by Dinkumware.
My cpu is a dual core t2500 with 2gb ddr2.
I tryed both the intel 9.1 compiler and the Microsoft one.
In both cases I used the O3 optimizations, release mode, and with the
Intel one I also tryed the /Qansi_alias /Qipo options.Have you turned off checked iterators? (see:
http://www.codeproject.com/vcpp/stl/...diterators.asp)
--
Roland Pibinger
"The best software is simple, elegant, and full of drama" - Grady Booch
Have you turned off checked iterators? (see:http://www.codeproject.com/vcpp/stl/...diterators.asp)
Thank you for very usefull suggestion. I didn''t know that checked
iterators were turned on even in release mode in vc8 by default.
The new results (with checked iterators turned off) are:
Microsoft:
vector: 94
array: 94
stupid: 94
ptr: 141
ptr: 96
Intel:
vector: 141
array: 141 //62 if I eanble SSE2
stupid: 141 //62 if I enable SSE2 and disable exception handling
ptr: 141
ptr: 140
The situation is now much better.
Howere is seems that the Microsofr compiler is still doing 35% better
in all the situations except the "vector iterator" one.
Do you have any other suggestion to try?
I know nothing of lowe level instructions, but if I post the
"assembler - like" code here would it be of any help for you?
Thank you
Cheers
StephQ
On Apr 30, 4:32 pm, StephQ <askmeo...@mailinator.comwrote:Have you turned off checked iterators? (see:http://www.codeproject.com/vcpp/stl/...diterators.asp)
Thank you for very usefull suggestion. I didn''t know that checked
iterators were turned on even in release mode in vc8 by default.
The new results (with checked iterators turned off) are:
Microsoft:
vector: 94
array: 94
stupid: 94
ptr: 141
ptr: 96
Intel:
vector: 141
array: 141 //62 if I eanble SSE2
stupid: 141 //62 if I enable SSE2 and disable exception handling
ptr: 141
ptr: 140
The situation is now much better.
Howere is seems that the Microsofr compiler is still doing 35% better
in all the situations except the "vector iterator" one.
Do you have any other suggestion to try?
I know nothing of lowe level instructions, but if I post the
"assembler - like" code here would it be of any help for you?
Thank you
Cheers
StephQI reply to myself just to tell you that I don''t mind investigating any
more these issues.
I ran the test using doubles instead of int and the results are very
similar, with the microsoft compiler having something like 3% more
performance.
However the Stepanov Abstraction test favours the intel compiler by a
large margin.
Abstraction penalty with Intel:
0.85
0.68 with sse2
With Microsoft:
1.11
A curiosity..... how is it possible to get an abstraction penalty
below 1 ?
Chhers
StephQ
这篇关于STL矢量性能问题,包括基准的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!