(缺少)使用C ++ 11移动语义的性能改进 [英] (Missing) performance improvements with C++11 move semantics

查看:87
本文介绍了(缺少)使用C ++ 11移动语义的性能改进的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在写C ++ 11代码相当长一段时间,并没有做任何基准测试,只期望像矢量操作只是更快现在与移动语义。所以当实际使用GCC 4.7.2和clang 3.0(Ubuntu 12.10 64位上的默认编译器)进行基准测试时,我得到非常不满意的结果。这是我的测试代码:



编辑:对于@DeadMG和@ronag发布的类型从 std :: string my :: string ,它没有交换),并使所有内部字符串变大(200-700字节),以使它们不是SSO的受害者



EDIT2: COW是原因。再次通过伟大的意见,改变了存储从 std :: string std :: vector< char> 并省略copy / move onstructors(让编译器生成它们)。

EDIT3:编译时使用重新添加上一个解决方案 - DCOW 。这使得内部存储a std :: string 而不是 std :: vector< char> chico。

  #include< string> 
#include< vector>
#include< fstream>
#include< iostream>
#include< algorithm>
#include< functional>

static std :: size_t dec = 0;

namespace my {class string
{
public:
string(){}
#ifdef COW
string(const std :: string& ref):str(ref),val(dec%2? - ++ dec:++ dec){
#else
string(const std :: string& ref):val %2? - ++ dec:++ dec){
str.resize(ref.size());
std :: copy(ref.begin(),ref.end(),str.begin());
#endif
}

bool operator<(const string& other)const {return val<其他。 }

private:
#ifdef COW
std :: string str;
#else
std :: vector< char> str;
#endif
std :: size_t val;
}; }


template<类型名T>
void dup_vector(T& vec)
{
T v = vec;
for(typename T :: iterator i = v.begin(); i!= v.end(); ++ i)
#ifdef CPP11
vec.push_back(std :: move(* i));
#else
vec.push_back(* i);
#endif
}

int main()
{
std :: ifstream file;
file.open(/ etc / passwd);
std :: vector< my :: string>线;
while(!file.eof())
{
std :: string s;
std :: getline(file,s);
lines.push_back(s + s + s + s + s + s + s + s + s);
}

while(lines.size()<(1000 * 1000))
dup_vector(lines);
std :: cout<< lines.size()<< 元素<< std :: endl;

std :: sort(lines.begin(),lines.end());

return 0;
}

这样做是将/ etc / passwd读入一行向量,直到我们有至少一百万个条目。这是第一个优化应该是有用的,不仅是 dup_vector() std :: move() c>,而且当需要调整内部数组大小(创建新的+副本)时, push_back 本身应该会更好。



最后,向量被排序。这应该肯定是更快,当你不需要复制临时对象每次两个元素交换。



我编译并运行这两种方式,一个是C + +98,下一个为C ++ 11(显式移动为-DCPP11):

  1> $ rm -f a.out; g ++ --std = c ++ 98 test.cpp;时间./a.out 
2> $ rm -f a.out; g ++ --std = c ++ 11 -DCPP11 test.cpp;时间./a.out
3> $ rm -f a.out; clang ++ --std = c ++ 98 test.cpp; time ./a.out
4> $ rm -f a.out; clang ++ --std = c ++ 11 -DCPP11 test.cpp;时间./a.out

使用以下结果(每次编译两次):

  GCC C ++ 98 
1> real 0m9.626s
1> real 0m9.709s

GCC C ++ 11
2> real 0m10.163s
2>真正的0m10.130s

因此,编译为C ++ 11代码时运行速度稍慢。 clang也有类似的结果:

  clang C ++ 98 
3> real 0m8.906s
3> real 0m8.750s

clang C ++ 11
4> real 0m8.858s
4> real 0m9.053s

有人能告诉我为什么会这样吗?是编译器优化如此好,即使编译预C ++ 11,他们实际上达到移动语义行为毕竟?如果我添加 -O2 ,所有代码运行更快,但不同标准之间的结果几乎与上述相同。



EDIT :使用我的::字符串而不是std :: string和更大的单个字符串的新结果:

  $ rm -f a.out; g ++ --std = c ++ 98 test.cpp;时间./a.out 
real 0m16.637s
$ rm -f a.out; g ++ --std = c ++ 11 -DCPP11 test.cpp; time ./a.out
real 0m17.169s
$ rm -f a.out; clang ++ --std = c ++ 98 test.cpp;时间./a.out
real 0m16.222s
$ rm -f a.out; clang ++ --std = c ++ 11 -DCPP11 test.cpp;时间./a.out
real 0m15.652s

/ strong> C ++ 98和C + 11之间的差异与移动语义。

EDIT2:现在没有<$ c的情况下,使用C ++ 11与GCC稍微慢一些, $ c> std :: string 的性能改进是巨大的:

  $ rm -f.out; g ++ --std = c ++ 98 test.cpp;时间./a.out 
real 0m10.313s
$ rm -f a.out; g ++ --std = c ++ 11 -DCPP11 test.cpp;时间./a.out
real 0m5.267s
$ rm -f a.out; clang ++ --std = c ++ 98 test.cpp;时间./a.out
real 0m10.218s
$ rm -f a.out; clang ++ --std = c ++ 11 -DCPP11 test.cpp;时间./a.out
real 0m3.376s

也更大:

  $ rm -f a.out; g ++ -O2 --std = c ++ 98 test.cpp;时间./a.out 
real 0m5.243s
$ rm -f a.out; g ++ -O2 --std = c ++ 11 -DCPP11 test.cpp;时间./a.out
real 0m0.803s
$ rm -f a.out; clang ++ -O2 --std = c ++ 98 test.cpp;时间./a.out
real 0m5.248s
$ rm -f a.out; clang ++ -O2 --std = c ++ 11 -DCPP11 test.cpp;时间./a.out
real 0m0.785s

使用C ++ 11,速度提高了7倍。



感谢您的宝贵意见和解答。我希望这篇文章对他人也有用和有趣。

解决方案


当你不需要每次交换两个元素时复制临时的
对象。


std :: string 有一个 swap 成员,因此 sort 它的内部实现已经是移动语义,有效。并且只要涉及到SSO,你不会看到 std :: string 的复制和移动之间的区别。此外,一些版本的GCC仍然具有非C ++ 11允许的基于COW的实现,也不会在复制和移动之间看到很多差异。


I've been writing C++11 code for quite some time now, and haven't done any benchmarking of it, only expecting things like vector operations to "just be faster" now with move semantics. So when actually benchmarking with GCC 4.7.2 and clang 3.0 (default compilers on Ubuntu 12.10 64-bit) I get very unsatisfying results. This is my test code:

EDIT: With regards to the (good) answers posted by @DeadMG and @ronag, I changed the element type from std::string to my::string which does not have a swap(), and made all inner strings larger (200-700 bytes) so that they shouldn't be the victims of SSO.

EDIT2: COW was the reason. Adapted code again by the great comments, changed the storage from std::string to std::vector<char> and leaving out copy/move onstructors (letting the compiler generate them instead). Without COW, the speed difference is actually huge.

EDIT3: Re-added the previous solution when compiled with -DCOW. This makes the internal storage a std::string rather than a std::vector<char> as requested by @chico.

#include <string>
#include <vector>
#include <fstream>
#include <iostream>
#include <algorithm>
#include <functional>

static std::size_t dec = 0;

namespace my { class string
{
public:
    string( ) { }
#ifdef COW
    string( const std::string& ref ) : str( ref ), val( dec % 2 ? - ++dec : ++dec ) {
#else
    string( const std::string& ref ) : val( dec % 2 ? - ++dec : ++dec ) {
        str.resize( ref.size( ) );
        std::copy( ref.begin( ), ref.end( ), str.begin( ) );
#endif
    }

    bool operator<( const string& other ) const { return val < other.val; }

private:
#ifdef COW
    std::string str;
#else
    std::vector< char > str;
#endif
    std::size_t val;
}; }


template< typename T >
void dup_vector( T& vec )
{
    T v = vec;
    for ( typename T::iterator i = v.begin( ); i != v.end( ); ++i )
#ifdef CPP11
        vec.push_back( std::move( *i ) );
#else
        vec.push_back( *i );
#endif
}

int main( )
{
    std::ifstream file;
    file.open( "/etc/passwd" );
    std::vector< my::string > lines;
    while ( ! file.eof( ) )
    {
        std::string s;
        std::getline( file, s );
        lines.push_back( s + s + s + s + s + s + s + s + s );
    }

    while ( lines.size( ) < ( 1000 * 1000 ) )
        dup_vector( lines );
    std::cout << lines.size( ) << " elements" << std::endl;

    std::sort( lines.begin( ), lines.end( ) );

    return 0;
}

What this does is read /etc/passwd into a vector of lines, then duplicating this vector onto itself over and over until we have at least 1 million entries. This is where the first optimization should be useful, not only the explicit std::move() you see in dup_vector(), but also the push_back per se should perform better when it needs to resize (create new + copy) the inner array.

Finally, the vector is sorted. This should definitely be faster when you don't need to copy temporary objects each time two elements are swapped.

I compile and run this two ways, one being as C++98, the next as C++11 (with -DCPP11 for the explicit move):

1> $ rm -f a.out ; g++ --std=c++98 test.cpp ; time ./a.out
2> $ rm -f a.out ; g++ --std=c++11 -DCPP11 test.cpp ; time ./a.out
3> $ rm -f a.out ; clang++ --std=c++98 test.cpp ; time ./a.out
4> $ rm -f a.out ; clang++ --std=c++11 -DCPP11 test.cpp ; time ./a.out

With the following results (twice for each compilation):

GCC C++98
1> real 0m9.626s
1> real 0m9.709s

GCC C++11
2> real 0m10.163s
2> real 0m10.130s

So, it's slightly slower to run when compiled as C++11 code. Similar results goes for clang:

clang C++98
3> real 0m8.906s
3> real 0m8.750s

clang C++11
4> real 0m8.858s
4> real 0m9.053s

Can someone tell me why this is? Are the compilers optimizing so good even when compiling for pre-C++11, that they practically reach move semantic behaviour after all? If I add -O2, all code runs faster, but the results between the different standards are almost the same as above.

EDIT: New results with my::string and rather than std::string, and larger individual strings:

$ rm -f a.out ; g++ --std=c++98 test.cpp ; time ./a.out
real    0m16.637s
$ rm -f a.out ; g++ --std=c++11 -DCPP11 test.cpp ; time ./a.out
real    0m17.169s
$ rm -f a.out ; clang++ --std=c++98 test.cpp ; time ./a.out
real    0m16.222s
$ rm -f a.out ; clang++ --std=c++11 -DCPP11 test.cpp ; time ./a.out
real    0m15.652s

There are very small differences between C++98 and C+11 with move semantics. Slightly slower with C++11 with GCC and slightly faster with clang, but still very small differencies.

EDIT2: Now without std::string's COW, the performance improvement is huge:

$ rm -f a.out ; g++ --std=c++98 test.cpp ; time ./a.out
real    0m10.313s
$ rm -f a.out ; g++ --std=c++11 -DCPP11 test.cpp ; time ./a.out
real    0m5.267s
$ rm -f a.out ; clang++ --std=c++98 test.cpp ; time ./a.out
real    0m10.218s
$ rm -f a.out ; clang++ --std=c++11 -DCPP11 test.cpp ; time ./a.out
real    0m3.376s

With optimization, the difference is a lot bigger too:

$ rm -f a.out ; g++ -O2 --std=c++98 test.cpp ; time ./a.out
real    0m5.243s
$ rm -f a.out ; g++ -O2 --std=c++11 -DCPP11 test.cpp ; time ./a.out
real    0m0.803s
$ rm -f a.out ; clang++ -O2 --std=c++98 test.cpp ; time ./a.out
real    0m5.248s
$ rm -f a.out ; clang++ -O2 --std=c++11 -DCPP11 test.cpp ; time ./a.out
real    0m0.785s

Above showing a factor of ~6-7 times faster with C++11.

Thanks for the great comments and answers. I hope this post will be useful and interesting to others too.

解决方案

This should definitely be faster when you don't need to copy temporary objects each time two elements are swapped.

std::string has a swap member, so sort will already use that, and it's internal implementation will already be move semantics, effectively. And you won't see a difference between copy and move for std::string as long as SSO is involved. In addition, some versions of GCC still have a non-C++11-permitted COW-based implementation, which also would not see much difference between copy and move.

这篇关于(缺少)使用C ++ 11移动语义的性能改进的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆