如何glibc的的strlen()实现工程 [英] How the glibc strlen() implementation works

查看:110
本文介绍了如何glibc的的strlen()实现工程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

的strlen()从K&安培; R只需要几行。

The strlen() from K&R takes only a few lines.

int strlen(char *s)
{
    char *p = s;
    while (*p != '\0')
        p++;
    return p - s;
}

glibc的版本更长的时间。为简单起见,我删除了所有的意见和64位实现,提取的版本的strlen()是这样的:

But the glibc version is much longer. For simplicity, I removed all the comments and the 64-bit implementation, the extracted version strlen() looks like this:

size_t strlen(const char *str)
{
    const char *char_ptr;
    const unsigned long int *longword_ptr;
    unsigned long int longword, magic_bits, himagic, lomagic;

    for (char_ptr = str; ((unsigned long int) char_ptr 
             & (sizeof (longword) - 1)) != 0; ++char_ptr)
       if (*char_ptr == '\0')
           return char_ptr - str;

    longword_ptr = (unsigned long int *) char_ptr;

    himagic = 0x80808080L;
    lomagic = 0x01010101L;

    for (;;)
    { 
        longword = *longword_ptr++;

        if (((longword - lomagic) & himagic) != 0)
        {
            const char *cp = (const char *) (longword_ptr - 1);

            if (cp[0] == 0)
                return cp - str;
            if (cp[1] == 0)
                return cp - str + 1;
            if (cp[2] == 0)
                return cp - str + 2;
            if (cp[3] == 0)
                return cp - str + 3;
        }
    }
}

随着很有帮助注释的帮助(点击这里),我得到了大部分如何这个工程。而不是检查'\\ 0'逐字节,glibc的的strlen()检查每一个字(4字节32位机,64位机8字节)。以这种方式,当字符串相对较长,性能能够得到改善。

With the help of the very helpful comment (click here), I got most of how this works. Instead of checking for '\0' byte by byte, the glibc strlen() checks every word(4 bytes in 32-bit machine, 8 byte in 64-bit machine). In this way, when the string is relatively long, the performance can be improved.

它由字节读取字节,直到 char_ptr 检查前几个字符是长字边界上对齐。然后,它使用一个循环来检查长字与全零的任何字节。如果有,请检查字节,并返回结果。

It checks the first few characters by reading byte by byte, until char_ptr is aligned on a longword boundary. Then it uses a loop to check if the longword has any bytes with of all-zeros. If there is, check which byte, and return the result.

我不明白的部分是,它如何检查长字一个字节是全零?

The part I don't get is, how does this check if one byte in longword is all-zeros?

if (((longword - lomagic) & himagic) != 0)

我可以建立 0x81818181 A 长字价值,它可以使 0x81818181 - 0x01010101 )及0x80808080 不等于 0 ,但目前还没有全零字节。

I can build a longword value of 0x81818181, which can make 0x81818181 - 0x01010101) & 0x80808080 not equal to 0, but there are no all-zero bytes.

难道这与事实ASCII值的范围从 0 127 ,所以 0×81 不是有效的ASCII?但我不认为C标准的力量字符串使用ASCII。

Is this related with the fact that ASCII values range from 0 to 127, so 0x81 isn't valid ASCII? But I don't think the C standard forces strings to use ASCII.

推荐答案

我想通了。不能相信就这么简单,我花了近一个半小时就可以了。

I figured it out. Can't believe it's that simple, I spent the last half an hour on it.

这是确定的,该检查

if (((longword - lomagic) & himagic) != 0)

让像 0x81818181 通价值,因为如果通过,对每一个字节以下测试将不会返回,因为没有全零字节。所以循环可以继续测试下长字

lets values like 0x81818181 pass, because if it passes, the following test on every byte would not return since there are no all-zero byte. So the loop can continue to test the next longword.

支票背后的算法是基于确定一个字具有零字节

The algorithm behind the check is based on Determine if a word has a zero byte

unsigned int v; 
bool hasZeroByte = ~((((v & 0x7F7F7F7F) + 0x7F7F7F7F) | v) | 0x7F7F7F7F);

在2的补数, - 0x01010101 + 0xFEFEFEFF 同样的效果。所不同的是,因为glibc不具有 V&安培; 0x7F7F7F7F ,这使得确保字节字有 0 最显著位。像 0x81818181 这prevents的例子,但glibc的忽略它,因为它并没有让它通过如前所述,该检查是正确的,只要它赢得了 ŧ错过了确实有全零字节的任何文字。

In 2's complement, - 0x01010101 has the same effect with + 0xFEFEFEFF. The difference is because glibc doesn't have v & 0x7F7F7F7F, which makes sure the bytes in the word has the most significant bit of 0. This prevents examples like 0x81818181, but glibc omits it because it doesn't have to let it pass as stated earlier, The check is correct as long as it won't miss any word that does have all-zero bytes.

这篇关于如何glibc的的strlen()实现工程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆