优化 itoa 功能 [英] optimized itoa function
问题描述
我正在考虑如何使用 SSE 指令实现整数(4 字节,无符号)到字符串的转换.通常的例程是将数字除以存储在局部变量中,然后将字符串取反(本例中缺少取反例程):
I am thinking on how to implement the conversion of an integer (4byte, unsigned) to string with SSE instructions. The usual routine is to divide the number and store it in a local variable, then invert the string (the inversion routine is missing in this example):
char *convert(unsigned int num, int base) {
static char buff[33];
char *ptr;
ptr = &buff[sizeof(buff) - 1];
*ptr = '\0';
do {
*--ptr="0123456789abcdef"[num%base];
num /= base;
} while(num != 0);
return ptr;
}
但是反转需要额外的时间.除了 SSE 指令可以更好地使用其他算法来并行化函数之外,还有其他算法吗?
But inversion will take extra time. Is there any other algorithm than can be used preferably with SSE instruction to parallelize the function?
推荐答案
优化代码的第一步是摆脱任意基础支持.这是因为除以常数几乎肯定是乘法,但除以 base
是除法,并且因为 '0'+n
比 "0123456789abcdef"[n]
(前者不涉及内存).
The first step to optimizing your code is getting rid of the arbitrary base support. This is because dividing by a constant is almost surely multiplication, but dividing by base
is division, and because '0'+n
is faster than "0123456789abcdef"[n]
(no memory involved in the former).
如果您需要超出此范围,您可以为您关心的基数(例如 10)中的每个字节制作查找表,然后矢量添加每个字节的(例如十进制)结果.如:
If you need to go beyond that, you could make lookup tables for each byte in the base you care about (e.g. 10), then vector-add the (e.g. decimal) results for each byte. As in:
00 02 00 80 (input)
0000000000 (place3[0x00])
+0000131072 (place2[0x02])
+0000000000 (place1[0x00])
+0000000128 (place0[0x80])
==========
0000131200 (result)
这篇关于优化 itoa 功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!