优化 itoa 功能 [英] optimized itoa function

查看:49
本文介绍了优化 itoa 功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在考虑如何使用 SSE 指令实现整数(4 字节,无符号)到字符串的转换.通常的例程是将数字除以存储在局部变量中,然后将字符串取反(本例中缺少取反例程):

I am thinking on how to implement the conversion of an integer (4byte, unsigned) to string with SSE instructions. The usual routine is to divide the number and store it in a local variable, then invert the string (the inversion routine is missing in this example):

char *convert(unsigned int num, int base) {
    static char buff[33];  

    char *ptr;    
    ptr = &buff[sizeof(buff) - 1];    
    *ptr = '\0';

    do {
        *--ptr="0123456789abcdef"[num%base];
        num /= base;
    } while(num != 0);

    return ptr;
}

但是反转需要额外的时间.除了 SSE 指令可以更好地使用其他算法来并行化函数之外,还有其他算法吗?

But inversion will take extra time. Is there any other algorithm than can be used preferably with SSE instruction to parallelize the function?

推荐答案

优化代码的第一步是摆脱任意基础支持.这是因为除以常数几乎肯定是乘法,但除以 base 是除法,并且因为 '0'+n"0123456789abcdef"[n](前者不涉及内存).

The first step to optimizing your code is getting rid of the arbitrary base support. This is because dividing by a constant is almost surely multiplication, but dividing by base is division, and because '0'+n is faster than "0123456789abcdef"[n] (no memory involved in the former).

如果您需要超出此范围,您可以为您关心的基数(例如 10)中的每个字节制作查找表,然后矢量添加每个字节的(例如十进制)结果.如:

If you need to go beyond that, you could make lookup tables for each byte in the base you care about (e.g. 10), then vector-add the (e.g. decimal) results for each byte. As in:

00 02 00 80 (input)

 0000000000 (place3[0x00])
+0000131072 (place2[0x02])
+0000000000 (place1[0x00])
+0000000128 (place0[0x80])
 ==========
 0000131200 (result)

这篇关于优化 itoa 功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆