STL矢量性能问题，包括基准 [英] Problem with STL vector peformance, benchmarks included

查看：50 发布时间：2019/6/6 17:47:07 c

本文介绍了STL矢量性能问题，包括基准的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我发现那篇旧帖子：
http://groups.google.com/group/comp=\"519204726d01e8

我刚刚删除了#include< kubux .....行。

******旧帖子为您的Convenince **** ****

你是对的：

#include< vector>

#include< iostream>

#include< ctime>

#include< memory>

#include< kubux / bits / allocator.cc>

#include< kubux / bits / new_delete_allocator.cc>

#include< kubux / bits / malloc_free_allocator.cc>

模板< typename T，typename Alloc = std :: allocator< T

class stupid {

public：

typedef Alloc allocator;

typedef typename allocator :: value_type value_type;

typedef typename allocator :: size_type size_type;

typedef typename allocator :: difference_type difference_type;

typedef typename allocator :: pointer pointer;

typedef typename allocator :: const_pointer const_pointer;

typedef typename allocator :: reference reference;

typedef typename allocator :: const_reference const_reference;

typedef指针迭代器;

typedef const_pointer

const_iterator;

typedef typename std :: reverse_iterator<迭代器>

reverse_iterator;

typedef typename std :: reverse_iterator< const_iterator>

const_reverse_iterator;

私人：

指针ptr;

size_type the_size;

public：

stupid（size_type length）：

ptr（new T [length] ），

the_size（长度）

{

for（iterator iter = this-> ptr;

iter！= this-> ptr + the_size;

++ iter）{

:: new（static_cast< void *>（iter））T（）;

}

}

~stupid（void）{

iterator iter = ptr + the_size ;

while（iter ptr）{

- iter;

iter-> ~T（）;

}

{

分配器分配;

alloc.deallocate（ptr，the_size）;

}

the_size = 0;

}

引用运算符[]（size_type index）{

return （this-> ptr [index]）;

}

const_reference operator []（size_type index）const {

return（this-> ptr [index]）;

}

}; //愚蠢

int main（无效）{

const unsigned long l = 50000000;

{

std :: vector< int v（l）;

std :: clock_t loop_start = std :: clock（）;

for（unsigned long i = 0; i< l; ++ i）{

v [i] = 5;

}

std :: clock_t loop_end = std :: clock（）;

std :: cout<< vector： << loop_end - loop_start<< std :: endl;

}

{

int * v = new int [l];

std :: fill_n（v，l，0）;

std :: clock_t loop_start = std :: clock（）;

for（unsigned long i = 0; i< ; l; ++ i）{

v [i] = 5;

}

std :: clock_t loop_end = std :: clock（）;

std :: cout<< array： << loop_end - loop_start<< std :: endl;

}

{

stupid< int，std :: allocator< int v（l）;

std :: clock_t loop_start = std :: clock（）;

for（unsigned long i = 0; i< l; ++ i）{

v [i] = 5;

}

std :: clock_t loop_end = std :: clock（）;

std :: cout<< 愚蠢的： << loop_end - loop_start<< std :: endl;

}

{

std :: vector< intv（l）;

std :: clock_t loop_start = std :: clock（）;

for（std :: vector< int> :: iterator i = v.begin（）;

i！= v.end（）; ++ i）{

* i = 5;

}

std :: clock_t loop_end = std :: clock（）;

std :: cout<< ptr： << loop_end - loop_start<< std :: endl;

}

{

int * v = new int [l];

std :: fill_n（v，l，0）;

std :: clock_t loop_start = std :: clock（）;

for（int * i = v; i< ; v + l; ++ i）{

* i = 5;

}

std :: clock_t loop_end = std :: clock （）;

std :: cout<< ptr： << loop_end - loop_start<< std :: endl;

}

}

a.out

vector：320000

数组：320000

愚蠢：350000

iterator：340000

ptr：340000

再也没有惊喜了。

谢谢

Kai-Uwe Bux

************************************** ************ *

我在visual studio professional 2005上使用

标准STL实现运行报告的测试，应该由Dinkumware提供。

我的cpu是双核t2500，带有2gb ddr2。

我尝试了intel 9.1编译器和Microsoft编译器。 br />
在这两种情况下我都使用了O3优化，发布模式，并且使用了

英特尔，我还尝试了/ Qansi_alias / Qipo选项。

结果：

微软：

矢量：141

数组：94

愚蠢：93

ptr：172

ptr：78

英特尔：

矢量：312

数组：156 //如果我需要P4扩展，则变为45 ，其他价值

几乎保持不变

愚蠢：157

ptr：1047

ptr：156

我承认我对使用英特尔

编译器获得的结果感到非常失望。

路上是否有任何错误尝试进行了或者我发布的

源代码？

如果一切正确，我怎么能调查

问题在哪里？

干杯

StephQ

解决方案

4月30日2007 05:48:31 -0700，StephQ写道：

>我使用
标准STL实现在visual studio professional 2005上运行报告的测试，应该由Dinkumware提供。
我的cpu是双核t2500，带有2gb ddr2。
我试用了intel 9.1编译器和Mi在这两种情况下我都使用了O3优化，发布模式，而且在使用英特尔的时候，我还尝试了/ Qansi_alias / Qipo选项。

你关闭了检查过的迭代器吗？（参见：
http://www.codeproject.com /vcpp/stl/...diterators.asp ）

-

Roland Pibinger

最好的软件是简单，优雅，充满戏剧性的 - Grady Booch

你关闭了检查过的迭代器吗？（参见： http://www.codeproject.com/vcpp/stl /...diterators.asp ）

感谢您提供非常有用的建议。我不知道检查

迭代器在默认情况下即使在vc8的发布模式下也会打开。

新结果（关闭已检查的迭代器））是：

微软：

矢量：94

数组：94

愚蠢：94

ptr：141

ptr：96

Intel：

vector：141
数组：141 // 62如果我使用SSE2

愚蠢：141 // 62如果我启用SSE2并禁用异常处理

ptr：141

ptr：140

情况现在好多了。

看来Microsofr编译器的表现仍然好35％<除了vector iterator之外的所有情况都是
一个。

你还有其他任何建议吗？

我对低级指令一无所知，但如果我发布了

汇编程序 - 像代码在这里对你有什么帮助吗？

谢谢

干杯

StephQ

4月30日下午4:32，StephQ< askmeo ... @ mailinator.comwrote：

< blockquote class =post_quotes>
你关闭了检查过的迭代器吗？（参见： http://www.codeproject.com/vcpp/stl /...diterators.asp ）

感谢您提供非常有用的建议。我不知道检查

迭代器在默认情况下即使在vc8的发布模式下也会打开。

新结果（关闭已检查的迭代器））是：

微软：

矢量：94

数组：94

愚蠢：94

ptr：141

ptr：96

Intel：

vector：141
数组：141 // 62如果我使用SSE2

愚蠢：141 // 62如果我启用SSE2并禁用异常处理

ptr：141

ptr：140

情况现在好多了。

看来Microsofr编译器的表现仍然好35％<除了vector iterator之外的所有情况都是
一个。

你还有其他任何建议吗？

我对低级指令一无所知，但如果我发布了

汇编程序 - 像代码在这里对你有什么帮助吗？

谢谢

干杯

StephQ

我回复自己只是为了告诉你我不介意调查任何这些问题。

我使用双打而不是int运行测试，结果非常类似于b $ b，微软编译器的性能提高了3％

性能。

然而，Stepanov抽象测试有利于英特尔编译器的优惠价格为

。

英特尔的抽象罚款：

0.85

0.68 with sse2

随微软：

1.11

好奇心。 ....如何获得抽象罚款

低于1？

Chhers

StephQ

I found that old post:
http://groups.google.com/group/comp....519204726d01e8

I just erased the #include <kubux.....lines.

****** old post for your convenince ********
You are right:

#include <vector>
#include <iostream>
#include <ctime>
#include <memory>

#include <kubux/bits/allocator.cc>
#include <kubux/bits/new_delete_allocator.cc>
#include <kubux/bits/malloc_free_allocator.cc>

template < typename T, typename Alloc = std::allocator<T
class stupid {
public:

typedef Alloc allocator;
typedef typename allocator::value_type value_type;
typedef typename allocator::size_type size_type;
typedef typename allocator::difference_type difference_type;
typedef typename allocator::pointer pointer;
typedef typename allocator::const_pointer const_pointer;
typedef typename allocator::reference reference;
typedef typename allocator::const_reference const_reference;

typedef pointer iterator;
typedef const_pointer
const_iterator;
typedef typename std::reverse_iterator< iterator >
reverse_iterator;
typedef typename std::reverse_iterator< const_iterator >
const_reverse_iterator;

private:

pointer ptr;
size_type the_size;

public:

stupid ( size_type length ) :
ptr ( new T [ length ] ),
the_size ( length )
{
for ( iterator iter = this->ptr;
iter != this->ptr + the_size;
++ iter ) {
::new( static_cast<void*>(iter) ) T();
}
}

~stupid ( void ) {
iterator iter = ptr + the_size;
while ( iter ptr ) {
-- iter;
iter->~T();
}
{
allocator alloc;
alloc.deallocate( ptr, the_size );
}
the_size = 0;
}

reference operator[] ( size_type index ) {
return( this->ptr[ index ] );
}

const_reference operator[] ( size_type index ) const {
return( this->ptr[ index ] );
}

}; // stupid

int main ( void ) {
const unsigned long l = 50000000;
{
std::vector< int v ( l );
std::clock_t loop_start = std::clock();
for ( unsigned long i = 0; i < l; ++i ) {
v[i] = 5;
}
std::clock_t loop_end = std::clock();
std::cout << "vector: " << loop_end - loop_start << std::endl;
}
{
int* v = new int [ l ];
std::fill_n(v, l, 0);
std::clock_t loop_start = std::clock();
for ( unsigned long i = 0; i < l; ++i ) {
v[i] = 5;
}
std::clock_t loop_end = std::clock();
std::cout << "array: " << loop_end - loop_start << std::endl;
}
{
stupid< int, std::allocator<int v ( l );
std::clock_t loop_start = std::clock();
for ( unsigned long i = 0; i < l; ++i ) {
v[i] = 5;
}
std::clock_t loop_end = std::clock();
std::cout << "stupid: " << loop_end - loop_start << std::endl;
}
{
std::vector<intv ( l );
std::clock_t loop_start = std::clock();
for ( std::vector<int>::iterator i = v.begin();
i != v.end(); ++i ) {
*i = 5;
}
std::clock_t loop_end = std::clock();
std::cout << "ptr: " << loop_end - loop_start << std::endl;
}
{
int* v = new int [ l ];
std::fill_n(v, l, 0);
std::clock_t loop_start = std::clock();
for ( int* i = v; i < v+l; ++i ) {
*i = 5;
}
std::clock_t loop_end = std::clock();
std::cout << "ptr: " << loop_end - loop_start << std::endl;
}

}

a.out

vector: 320000
array: 320000
stupid: 350000
iterator: 340000
ptr: 340000

No surprises anymore.

Thanks

Kai-Uwe Bux
************************************************** *

I ran the reported test on visual studio professional 2005 with its
standard STL implementation, which should be supplyed by Dinkumware.
My cpu is a dual core t2500 with 2gb ddr2.

I tryed both the intel 9.1 compiler and the Microsoft one.
In both cases I used the O3 optimizations, release mode, and with the
Intel one I also tryed the /Qansi_alias /Qipo options.

Results:

Microsoft:
vector: 141
array: 94
stupid: 93
ptr: 172
ptr: 78

Intel:
vector: 312
array: 156 // becomes 45 if I require P4 extensions, other values
remains nearly the same
stupid: 157
ptr: 1047
ptr: 156

I admit I''m quite disappointed wit the reults obtained with the Intel
compiler.
Is there any fault in the way the tast was conducted or with the
source code I posted?
If everything is correct, how could I investigate where is the
problem?

Cheers
StephQ

解决方案

On 30 Apr 2007 05:48:31 -0700, StephQ wrote:
>I ran the reported test on visual studio professional 2005 with its
standard STL implementation, which should be supplyed by Dinkumware.
My cpu is a dual core t2500 with 2gb ddr2.
I tryed both the intel 9.1 compiler and the Microsoft one.
In both cases I used the O3 optimizations, release mode, and with the
Intel one I also tryed the /Qansi_alias /Qipo options.
Have you turned off checked iterators? (see:
http://www.codeproject.com/vcpp/stl/...diterators.asp)
--
Roland Pibinger
"The best software is simple, elegant, and full of drama" - Grady Booch

Have you turned off checked iterators? (see:http://www.codeproject.com/vcpp/stl/...diterators.asp)

Thank you for very usefull suggestion. I didn''t know that checked
iterators were turned on even in release mode in vc8 by default.

The new results (with checked iterators turned off) are:

Microsoft:
vector: 94
array: 94
stupid: 94
ptr: 141
ptr: 96

Intel:
vector: 141
array: 141 //62 if I eanble SSE2
stupid: 141 //62 if I enable SSE2 and disable exception handling
ptr: 141
ptr: 140

The situation is now much better.
Howere is seems that the Microsofr compiler is still doing 35% better
in all the situations except the "vector iterator" one.

Do you have any other suggestion to try?
I know nothing of lowe level instructions, but if I post the
"assembler - like" code here would it be of any help for you?

Thank you

Cheers
StephQ

On Apr 30, 4:32 pm, StephQ <askmeo...@mailinator.comwrote:

Have you turned off checked iterators? (see:http://www.codeproject.com/vcpp/stl/...diterators.asp)

Thank you for very usefull suggestion. I didn''t know that checked
iterators were turned on even in release mode in vc8 by default.

The new results (with checked iterators turned off) are:

Microsoft:
vector: 94
array: 94
stupid: 94
ptr: 141
ptr: 96

Intel:
vector: 141
array: 141 //62 if I eanble SSE2
stupid: 141 //62 if I enable SSE2 and disable exception handling
ptr: 141
ptr: 140

The situation is now much better.
Howere is seems that the Microsofr compiler is still doing 35% better
in all the situations except the "vector iterator" one.

Do you have any other suggestion to try?
I know nothing of lowe level instructions, but if I post the
"assembler - like" code here would it be of any help for you?

Thank you

Cheers
StephQ
I reply to myself just to tell you that I don''t mind investigating any
more these issues.
I ran the test using doubles instead of int and the results are very
similar, with the microsoft compiler having something like 3% more
performance.

However the Stepanov Abstraction test favours the intel compiler by a
large margin.
Abstraction penalty with Intel:
0.85
0.68 with sse2

With Microsoft:
1.11

A curiosity..... how is it possible to get an abstraction penalty
below 1 ?

Chhers
StephQ

这篇关于STL矢量性能问题，包括基准的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

STL矢量性能问题，包括基准 [英] Problem with STL vector peformance, benchmarks included

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

STL矢量性能问题，包括基准 [英] Problem with STL vector peformance, benchmarks included

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭