C ++性能挑战:整数到std ::字符串转换 [英] C++ performance challenge: integer to std::string conversion

查看:177
本文介绍了C ++性能挑战:整数到std ::字符串转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任何人都可以跳过我的整数到std :: string代码的性能,链接到下面?



已经有几个问题如何在C ++中将整数转换为 std :: string ,例如

这里是compile-> convert-integer-to-string-c现在可以使用一些常用方法来与之竞争:





流行信念 boost :: lexical_cast 有自己的实现(白皮书)并不使用 stringstream 和数字插入运算符。我真的很想看看它的性能比较,因为这个其他问题表明它是可悲的



和我自己的贡献,在台式电脑上是有竞争力的,并展示了在嵌入式系统上全速运行的方法,不同于依赖于整数模的算法:





如果您要使用该代码,我将使用简化的BSD许可证允许,必须提供归属)。只要询问。



最后,函数 ltoa 是非标准的,但广泛可用。





我将很快发布我的绩效测量作为答案。 / p>

算法规则





< h2>用于讨论

除了更好的算法,我还想在几个不同的平台和编译器上获得一些基准(让我们使用MB / s吞吐量作为我们的标准测量单位)。我相信我的算法的代码(我知道 sprintf benchmark需要一些快捷方式 - 现在是固定的)是由标准定义良好的行为,至少在ASCII假设,但如果您看到任何未定义的行为或输入的输出无效,请指出。



结论:



对于g ++和VC2010,不同的算法执行,可能是因为 std :: string 的不同实现。



代码被发现胜过 sprintf

/ code>一个数量级。 ostringstream 落后了50倍以上。



挑战的胜者是user434507,运行我自己在gcc的速度的350%。



当前(最终?)速度冠军是:




解决方案

  #include< string> 

const char digit_pairs [201] = {
00010203040506070809
10111213141516171819
20212223242526272829
30313233343536373839
40414243444546474849
50515253545556575859
60616263646566676869
70717273747576777879
80818283848586878889
90919293949596979899
};


std :: string& itostr(int n,std :: string& s)
{
if(n == 0)
{
s =0
return s;
}

int sign = - (n <0);
unsigned int val =(n ^ sign)-sign;

int size;
if(val> = 10000)
{
if(val> = 10000000)
{
if(val> = 1000000000)
size = 10 ;
else if(val> = 100000000)
size = 9;
else
size = 8;
}
else
{
if(val> = 1000000)
size = 7;
else if(val> = 100000)
size = 6;
else
size = 5;
}
}
else
{
if(val> = 100)
{
if(val> = 1000)
size = 4;
else
size = 3;
}
else
{
if(val> = 10)
size = 2;
else
size = 1;
}
}
size - = sign;
s.resize(size);
char * c =& s [0];
if(sign)
* c =' - ';

c + = size-1;
while(val> = 100)
{
int pos = val%100;
val / = 100;
*(short *)(c-1)= *(short *)(digit_pairs + 2 * pos);
c- = 2;
}
while(val> 0)
{
* c - ='0'+(val%10);
val / = 10;
}
return s;
}

std :: string& itostr(unsigned val,std :: string& s)
{
if(val == 0)
{
s =0
return s;
}

int size;
if(val> = 10000)
{
if(val> = 10000000)
{
if(val> = 1000000000)
size = 10 ;
else if(val> = 100000000)
size = 9;
else
size = 8;
}
else
{
if(val> = 1000000)
size = 7;
else if(val> = 100000)
size = 6;
else
size = 5;
}
}
else
{
if(val> = 100)
{
if(val> = 1000)
size = 4;
else
size = 3;
}
else
{
if(val> = 10)
size = 2;
else
size = 1;
}
}

s.resize(size);
char * c =& s [size-1];
while(val> = 100)
{
int pos = val%100;
val / = 100;
*(short *)(c-1)= *(short *)(digit_pairs + 2 * pos);
c- = 2;
}
while(val> 0)
{
* c - ='0'+(val%10);
val / = 10;
}
return s;
}

这将炸毁不允许未对齐内存访问的系统通过 *(short *)的第一个未对齐赋值将导致segfault),但是应该非常好地工作。



一个重要的事情是最小化使用 std :: string 。 (Ironic,我知道。)例如,在Visual Studio中,大多数对std :: string方法的调用不是内联的,即使在编译器选项中指定/ Ob2。因此,即使像调用 std :: string :: clear()这样你可能期望速度非常快的东西,连接CRT作为一个静态库和链接为DLL时多达300个计时器。



出于同样的原因,通过引用返回是更好的,因为它避免了赋值,构造函数和析构函数。


Can anyone beat the performance of my integer to std::string code, linked below?

There are already several questions that explain how to convert an integer into a std::string in C++, such as this one, but none of the solutions provided are efficient.

Here is compile-ready code for some common methods to compete against:

Contrary to popular belief, boost::lexical_cast has its own implementation (white paper) and does not use stringstream and numeric insertion operators. I'd really like to see its performance compared, because this other question suggests that it's miserable.

And my own contribution, which is competitive on desktop computers, and demonstrates an approach that runs at full speed on embedded systems as well, unlike algorithms dependent on integer modulo:

If you want to use that code, I'll make it available under a simplified BSD license (commercial use allowed, attribution required). Just ask.

Finally, the function ltoa is non-standard but widely available.

I'll post my performance measurements as an answer shortly.

Rules for algorithms

  • Provide code for a conversion of at least 32-bit signed and unsigned integers into decimal.
  • Produce output as a std::string.
  • No tricks that are incompatible with threading and signals (for example, static buffers).
  • You may assume an ASCII character set.
  • Make sure to test your code on INT_MIN on a two's complement machine where the absolute value is not representable.
  • Ideally, the output should be character-for-character identical with the canonical C++ version using stringstream, http://ideone.com/jh3Sa, but anything that is clearly understandable as the correct number is ok, too.
  • NEW: Although you can use whatever compiler and optimizer options (except completely disabled) you want for the comparison, the code needs to also compile and give correct results under at least VC++ 2010 and g++.

Hoped-for Discussion

Besides better algorithms, I'd also like to get some benchmarks on several different platforms and compilers (let's use MB/s throughput as our standard unit of measure). I believe that the code for my algorithm (I know the sprintf benchmark takes some shortcuts -- now fixed) is well-defined behavior by the standard, at least under the ASCII assumption, but if you see any undefined behavior or inputs for which the output is invalid, please point that out.

Conclusions:

Different algorithms perform for g++ and VC2010, likely due to the different implementations of std::string on each. VC2010 clearly does a better job with NRVO, getting rid of return-by-value helped only on gcc.

Code was found that outperforms sprintf by an order of magnitude. ostringstream falls behind by a factor of 50 and more.

The winner of the challenge is user434507 who produces code that runs 350% of the speed of my own on gcc. Further entries are closed due to the whims of the SO community.

The current (final?) speed champions are:

解决方案

#include <string>

const char digit_pairs[201] = {
  "00010203040506070809"
  "10111213141516171819"
  "20212223242526272829"
  "30313233343536373839"
  "40414243444546474849"
  "50515253545556575859"
  "60616263646566676869"
  "70717273747576777879"
  "80818283848586878889"
  "90919293949596979899"
};


std::string& itostr(int n, std::string& s)
{
    if(n==0)
    {
        s="0";
        return s;
    }

    int sign = -(n<0);
    unsigned int val = (n^sign)-sign;

    int size;
    if(val>=10000)
    {
        if(val>=10000000)
        {
            if(val>=1000000000)
                size=10;
            else if(val>=100000000)
                size=9;
            else 
                size=8;
        }
        else
        {
            if(val>=1000000)
                size=7;
            else if(val>=100000)
                size=6;
            else
                size=5;
        }
    }
    else 
    {
        if(val>=100)
        {
            if(val>=1000)
                size=4;
            else
                size=3;
        }
        else
        {
            if(val>=10)
                size=2;
            else
                size=1;
        }
    }
    size -= sign;
    s.resize(size);
    char* c = &s[0];
    if(sign)
        *c='-';

    c += size-1;
    while(val>=100)
    {
       int pos = val % 100;
       val /= 100;
       *(short*)(c-1)=*(short*)(digit_pairs+2*pos); 
       c-=2;
    }
    while(val>0)
    {
        *c--='0' + (val % 10);
        val /= 10;
    }
    return s;
}

std::string& itostr(unsigned val, std::string& s)
{
    if(val==0)
    {
        s="0";
        return s;
    }

    int size;
    if(val>=10000)
    {
        if(val>=10000000)
        {
            if(val>=1000000000)
                size=10;
            else if(val>=100000000)
                size=9;
            else 
                size=8;
        }
        else
        {
            if(val>=1000000)
                size=7;
            else if(val>=100000)
                size=6;
            else
                size=5;
        }
    }
    else 
    {
        if(val>=100)
        {
            if(val>=1000)
                size=4;
            else
                size=3;
        }
        else
        {
            if(val>=10)
                size=2;
            else
                size=1;
        }
    }

    s.resize(size);
    char* c = &s[size-1];
    while(val>=100)
    {
       int pos = val % 100;
       val /= 100;
       *(short*)(c-1)=*(short*)(digit_pairs+2*pos); 
       c-=2;
    }
    while(val>0)
    {
        *c--='0' + (val % 10);
        val /= 10;
    }
    return s;
}

This will blow up on systems that disallow unaligned memory accesses (in which case, the first unaligned assignment via *(short*) would cause a segfault), but should work very nicely otherwise.

One important thing to do is to minimize the use of std::string. (Ironic, I know.) In Visual Studio, for example, most calls to methods of std::string are not inlined, even if you specify /Ob2 in compiler options. So even something as trivial as a call to std::string::clear(), which you might expect to be very fast, can take 100 clockticks when linking CRT as a static library, and as much as 300 clockticks when linking as a DLL.

For the same reason, returning by reference is better because it avoids an assignment, a constructor and a destructor.

这篇关于C ++性能挑战:整数到std ::字符串转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆