STL矢量性能问题,包括基准 [英] Problem with STL vector peformance, benchmarks included

查看:50
本文介绍了STL矢量性能问题,包括基准的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现那篇旧帖子:
http://groups.google.com/group/comp=\"519204726d01e8


我刚刚删除了#include< kubux .....行。


******旧帖子为您的Convenince **** ****

你是对的:


#include< vector>

#include< iostream>

#include< ctime>

#include< memory>

#include< kubux / bits / allocator.cc>

#include< kubux / bits / new_delete_allocator.cc>

#include< kubux / bits / malloc_free_allocator.cc>


模板< typename T,typename Alloc = std :: allocator< T

class stupid {

public:


typedef Alloc allocator;

typedef typename allocator :: value_type value_type;

typedef typename allocator :: size_type size_type;

typedef typename allocator :: difference_type difference_type;

typedef typename allocator :: pointer pointer;

typedef typename allocator :: const_pointer const_pointer;

typedef typename allocator :: reference reference;

typedef typename allocator :: const_reference const_reference;

typedef指针迭代器;

typedef const_pointer

const_iterator;

typedef typename std :: reverse_iterator<迭代器>

reverse_iterator;

typedef typename std :: reverse_iterator< const_iterator>

const_reverse_iterator;


私人:


指针ptr;

size_type the_size;


public:


stupid(size_type length):

ptr(new T [length] ),

the_size(长度)

{

for(iterator iter = this-> ptr;

iter!= this-> ptr + the_size;

++ iter){

:: new(static_cast< void *>(iter))T();

}

}


~stupid(void){

iterator iter = ptr + the_size ;

while(iter ptr){

- iter;

iter-> ~T();

}

{

分配器分配;

alloc.deallocate(ptr,the_size);

}

the_size = 0;

}


引用运算符[](size_type index){

return (this-> ptr [index]);

}


const_reference operator [](size_type index)const {

return(this-> ptr [index]);

}


}; //愚蠢


int main(无效){

const unsigned long l = 50000000;

{

std :: vector< int v(l);

std :: clock_t loop_start = std :: clock();

for(unsigned long i = 0; i< l; ++ i){

v [i] = 5;

}

std :: clock_t loop_end = std :: clock();

std :: cout<< vector: << loop_end - loop_start<< std :: endl;

}

{

int * v = new int [l];

std :: fill_n(v,l,0);

std :: clock_t loop_start = std :: clock();

for(unsigned long i = 0; i< ; l; ++ i){

v [i] = 5;

}

std :: clock_t loop_end = std :: clock( );

std :: cout<< array: << loop_end - loop_start<< std :: endl;

}

{

stupid< int,std :: allocator< int v(l);

std :: clock_t loop_start = std :: clock();

for(unsigned long i = 0; i< l; ++ i){

v [i] = 5;

}

std :: clock_t loop_end = std :: clock();

std :: cout<< 愚蠢的: << loop_end - loop_start<< std :: endl;

}

{

std :: vector< intv(l);

std :: clock_t loop_start = std :: clock();

for(std :: vector< int> :: iterator i = v.begin();

i!= v.end(); ++ i){

* i = 5;

}

std :: clock_t loop_end = std :: clock();

std :: cout<< ptr: << loop_end - loop_start<< std :: endl;

}

{

int * v = new int [l];

std :: fill_n(v,l,0);

std :: clock_t loop_start = std :: clock();

for(int * i = v; i< ; v + l; ++ i){

* i = 5;

}

std :: clock_t loop_end = std :: clock ();

std :: cout<< ptr: << loop_end - loop_start<< std :: endl;

}


}


a.out



vector:320000

数组:320000

愚蠢:350000

iterator:340000

ptr:340000


再也没有惊喜了。


谢谢


Kai-Uwe Bux

************************************** ************ *


我在visual studio professional 2005上使用

标准STL实现运行报告的测试,应该由Dinkumware提供。

我的cpu是双核t2500,带有2gb ddr2。


我尝试了intel 9.1编译器和Microsoft编译器。 br />
在这两种情况下我都使用了O3优化,发布模式,并且使用了

英特尔,我还尝试了/ Qansi_alias / Qipo选项。


结果:


微软:

矢量:141

数组:94

愚蠢:93

ptr:172

ptr:78


英特尔:

矢量:312

数组:156 //如果我需要P4扩展,则变为45 ,其他价值

几乎保持不变

愚蠢:157

ptr:1047

ptr:156


我承认我对使用英特尔

编译器获得的结果感到非常失望。

路上是否有任何错误尝试进行了或者我发布的

源代码?

如果一切正确,我怎么能调查

问题在哪里?


干杯

StephQ

解决方案

4月30日2007 05:48:31 -0700,StephQ写道:


>我使用
标准STL实现在visual studio professional 2005上运行报告的测试,应该由Dinkumware提供。
我的cpu是双核t2500,带有2gb ddr2。
我试用了intel 9.1编译器和Mi在这两种情况下我都使用了O3优化,发布模式,而且在使用英特尔的时候,我还尝试了/ Qansi_alias / Qipo选项。



你关闭了检查过的迭代器吗? (参见:
http://www.codeproject.com /vcpp/stl/...diterators.asp

-

Roland Pibinger

最好的软件是简单,优雅,充满戏剧性的 - Grady Booch


你关闭了检查过的迭代器吗? (参见: http://www.codeproject.com/vcpp/stl /...diterators.asp


感谢您提供非常有用的建议。我不知道检查

迭代器在默认情况下即使在vc8的发布模式下也会打开。


新结果(关闭已检查的迭代器) )是:


微软:

矢量:94

数组:94

愚蠢:94

ptr:141

ptr:96

Intel:

vector:141
数组:141 // 62如果我使用SSE2

愚蠢:141 // 62如果我启用SSE2并禁用异常处理

ptr:141

ptr:140


情况现在好多了。

看来Microsofr编译器的表现仍然好35%<除了vector iterator之外的所有情况都是
一个。


你还有其他任何建议吗?

我对低级指令一无所知,但如果我发布了

汇编程序 - 像代码在这里对你有什么帮助吗?


谢谢


干杯

StephQ


4月30日下午4:32,StephQ< askmeo ... @ mailinator.comwrote:

< blockquote class =post_quotes>
你关闭了检查过的迭代器吗? (参见: http://www.codeproject.com/vcpp/stl /...diterators.asp



感谢您提供非常有用的建议。我不知道检查

迭代器在默认情况下即使在vc8的发布模式下也会打开。


新结果(关闭已检查的迭代器) )是:


微软:

矢量:94

数组:94

愚蠢:94

ptr:141

ptr:96

Intel:

vector:141
数组:141 // 62如果我使用SSE2

愚蠢:141 // 62如果我启用SSE2并禁用异常处理

ptr:141

ptr:140


情况现在好多了。

看来Microsofr编译器的表现仍然好35%<除了vector iterator之外的所有情况都是
一个。


你还有其他任何建议吗?

我对低级指令一无所知,但如果我发布了

汇编程序 - 像代码在这里对你有什么帮助吗?


谢谢


干杯

StephQ



我回复自己只是为了告诉你我不介意调查任何这些问题。

我使用双打而不是int运行测试,结果非常类似于b $ b,微软编译器的性能提高了3%

性能。


然而,Stepanov抽象测试有利于英特尔编译器的优惠价格为



英特尔的抽象罚款:

0.85

0.68 with sse2

随微软:

1.11


好​​奇心。 ....如何获得抽象罚款

低于1?


Chhers

StephQ

I found that old post:
http://groups.google.com/group/comp....519204726d01e8

I just erased the #include <kubux.....lines.

****** old post for your convenince ********
You are right:

#include <vector>
#include <iostream>
#include <ctime>
#include <memory>

#include <kubux/bits/allocator.cc>
#include <kubux/bits/new_delete_allocator.cc>
#include <kubux/bits/malloc_free_allocator.cc>

template < typename T, typename Alloc = std::allocator<T
class stupid {
public:

typedef Alloc allocator;
typedef typename allocator::value_type value_type;
typedef typename allocator::size_type size_type;
typedef typename allocator::difference_type difference_type;
typedef typename allocator::pointer pointer;
typedef typename allocator::const_pointer const_pointer;
typedef typename allocator::reference reference;
typedef typename allocator::const_reference const_reference;

typedef pointer iterator;
typedef const_pointer
const_iterator;
typedef typename std::reverse_iterator< iterator >
reverse_iterator;
typedef typename std::reverse_iterator< const_iterator >
const_reverse_iterator;

private:

pointer ptr;
size_type the_size;

public:

stupid ( size_type length ) :
ptr ( new T [ length ] ),
the_size ( length )
{
for ( iterator iter = this->ptr;
iter != this->ptr + the_size;
++ iter ) {
::new( static_cast<void*>(iter) ) T();
}
}

~stupid ( void ) {
iterator iter = ptr + the_size;
while ( iter ptr ) {
-- iter;
iter->~T();
}
{
allocator alloc;
alloc.deallocate( ptr, the_size );
}
the_size = 0;
}

reference operator[] ( size_type index ) {
return( this->ptr[ index ] );
}

const_reference operator[] ( size_type index ) const {
return( this->ptr[ index ] );
}

}; // stupid

int main ( void ) {
const unsigned long l = 50000000;
{
std::vector< int v ( l );
std::clock_t loop_start = std::clock();
for ( unsigned long i = 0; i < l; ++i ) {
v[i] = 5;
}
std::clock_t loop_end = std::clock();
std::cout << "vector: " << loop_end - loop_start << std::endl;
}
{
int* v = new int [ l ];
std::fill_n(v, l, 0);
std::clock_t loop_start = std::clock();
for ( unsigned long i = 0; i < l; ++i ) {
v[i] = 5;
}
std::clock_t loop_end = std::clock();
std::cout << "array: " << loop_end - loop_start << std::endl;
}
{
stupid< int, std::allocator<int v ( l );
std::clock_t loop_start = std::clock();
for ( unsigned long i = 0; i < l; ++i ) {
v[i] = 5;
}
std::clock_t loop_end = std::clock();
std::cout << "stupid: " << loop_end - loop_start << std::endl;
}
{
std::vector<intv ( l );
std::clock_t loop_start = std::clock();
for ( std::vector<int>::iterator i = v.begin();
i != v.end(); ++i ) {
*i = 5;
}
std::clock_t loop_end = std::clock();
std::cout << "ptr: " << loop_end - loop_start << std::endl;
}
{
int* v = new int [ l ];
std::fill_n(v, l, 0);
std::clock_t loop_start = std::clock();
for ( int* i = v; i < v+l; ++i ) {
*i = 5;
}
std::clock_t loop_end = std::clock();
std::cout << "ptr: " << loop_end - loop_start << std::endl;
}

}

a.out

vector: 320000
array: 320000
stupid: 350000
iterator: 340000
ptr: 340000

No surprises anymore.

Thanks

Kai-Uwe Bux
************************************************** *

I ran the reported test on visual studio professional 2005 with its
standard STL implementation, which should be supplyed by Dinkumware.
My cpu is a dual core t2500 with 2gb ddr2.

I tryed both the intel 9.1 compiler and the Microsoft one.
In both cases I used the O3 optimizations, release mode, and with the
Intel one I also tryed the /Qansi_alias /Qipo options.

Results:

Microsoft:
vector: 141
array: 94
stupid: 93
ptr: 172
ptr: 78

Intel:
vector: 312
array: 156 // becomes 45 if I require P4 extensions, other values
remains nearly the same
stupid: 157
ptr: 1047
ptr: 156

I admit I''m quite disappointed wit the reults obtained with the Intel
compiler.
Is there any fault in the way the tast was conducted or with the
source code I posted?
If everything is correct, how could I investigate where is the
problem?

Cheers
StephQ

解决方案

On 30 Apr 2007 05:48:31 -0700, StephQ wrote:

>I ran the reported test on visual studio professional 2005 with its
standard STL implementation, which should be supplyed by Dinkumware.
My cpu is a dual core t2500 with 2gb ddr2.
I tryed both the intel 9.1 compiler and the Microsoft one.
In both cases I used the O3 optimizations, release mode, and with the
Intel one I also tryed the /Qansi_alias /Qipo options.

Have you turned off checked iterators? (see:
http://www.codeproject.com/vcpp/stl/...diterators.asp)
--
Roland Pibinger
"The best software is simple, elegant, and full of drama" - Grady Booch


Have you turned off checked iterators? (see:http://www.codeproject.com/vcpp/stl/...diterators.asp)

Thank you for very usefull suggestion. I didn''t know that checked
iterators were turned on even in release mode in vc8 by default.

The new results (with checked iterators turned off) are:

Microsoft:
vector: 94
array: 94
stupid: 94
ptr: 141
ptr: 96

Intel:
vector: 141
array: 141 //62 if I eanble SSE2
stupid: 141 //62 if I enable SSE2 and disable exception handling
ptr: 141
ptr: 140

The situation is now much better.
Howere is seems that the Microsofr compiler is still doing 35% better
in all the situations except the "vector iterator" one.

Do you have any other suggestion to try?
I know nothing of lowe level instructions, but if I post the
"assembler - like" code here would it be of any help for you?

Thank you

Cheers
StephQ


On Apr 30, 4:32 pm, StephQ <askmeo...@mailinator.comwrote:

Have you turned off checked iterators? (see:http://www.codeproject.com/vcpp/stl/...diterators.asp)


Thank you for very usefull suggestion. I didn''t know that checked
iterators were turned on even in release mode in vc8 by default.

The new results (with checked iterators turned off) are:

Microsoft:
vector: 94
array: 94
stupid: 94
ptr: 141
ptr: 96

Intel:
vector: 141
array: 141 //62 if I eanble SSE2
stupid: 141 //62 if I enable SSE2 and disable exception handling
ptr: 141
ptr: 140

The situation is now much better.
Howere is seems that the Microsofr compiler is still doing 35% better
in all the situations except the "vector iterator" one.

Do you have any other suggestion to try?
I know nothing of lowe level instructions, but if I post the
"assembler - like" code here would it be of any help for you?

Thank you

Cheers
StephQ

I reply to myself just to tell you that I don''t mind investigating any
more these issues.
I ran the test using doubles instead of int and the results are very
similar, with the microsoft compiler having something like 3% more
performance.

However the Stepanov Abstraction test favours the intel compiler by a
large margin.
Abstraction penalty with Intel:
0.85
0.68 with sse2

With Microsoft:
1.11

A curiosity..... how is it possible to get an abstraction penalty
below 1 ?

Chhers
StephQ


这篇关于STL矢量性能问题,包括基准的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆