为什么这个实现的strlen()的工作? [英] Why does this implementation of strlen() work?

查看:137
本文介绍了为什么这个实现的strlen()的工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

<子>(声明:我见过这个问题,我不是重新问 - 我很感兴趣的为什么的code ++工程,和的的在如何它的工作原理) <。 / p>

的这种实现苹果公司(同时,FreeBSD的)的strlen()。它采用了著名的优化绝招,即它会检查,而不是做一个逐字节的比较,以0一次4个或8个字节,

 为size_t strlen的(为const char *海峡)
{
    为const char * p;
    const的无符号长* LP;    / *跳到前几个字节,直到我们有一个对准P * /
    为(P = STR(uintptr_t形式)P和; LONGPTR_MASK,P ++)
        如果(* P =='\\ 0')
            返回(P - 海峡);    / *使用字大小的操作*扫描字符串的其余部分/
    对于(LP =(const的无符号长*)p; LP ++)
        如果((* LP - mask01)及mask80){
        P =(为const char *)(LP);
        testbyte(0);
        testbyte(1);
        testbyte(2);
        testbyte(3);
#如果(LONG_BIT&GT; = 64)
        testbyte(4);
        testbyte(5);
        testbyte(6);
        testbyte(7);
#万一
    }    /* 还没到 */
    返回(0);
}

现在我的问题是:也许我缺少明显的,但不能将此看过去字符串的结束?如果我们有一个字符串的长度不能整除的字的大小?试想以下情形:

  |&LT; ----------------你所有的记忆都属于我们------------ ---&GT; |&LT;  - 不是我们的记忆 - &GT;
+ ------------- ------------- + ------------- + ------- + ------ + ------------- + - -
| 'A'| 'B'| 'C'| 'D'| 0 |
+ ------------- ------------- + ------------- + ------- + ------ + ------------- + - -
^ ^ ^
| ||
+ ------------------------------------------------- ----- ++ -------------- - -
                       长字#1长字#2

在读第二个长字,程序访问它不应该实际上是访问字节......这不就是错了吗?我是pretty相信,苹果与BSD人知道他们在做什么,所以可能有人请解释为什么这是正确的?

有一件事我注意到的是,<一个href=\"http://stackoverflow.com/questions/11787810/strlen-performance-implementation/11856696#11856696\">beerboy断言这是不确定的行为的,我也相信它的确是,但他被告知,这不是,因为我们对准字大小与最初的循环(这里没有显示)。不过,我没有看到在所有的,为什么对齐是任何有关如果数组不够长,我们正在阅读过去的结束。


解决方案

尽管这在技术上是未定义的行为,在实践中没有原生架构检查在更细的粒度不是一个字的大小超出边界的内存访问。因此,边读边垃圾过去的终结者最终可能,结果会不会崩溃。

(Disclaimer: I've seen this question, and I am not re-asking it -- I am interested in why the code works, and not in how it works.)

So here's this implementation of Apple's (well, FreeBSD's) strlen(). It uses a well-known optimization trick, namely it checks 4 or 8 bytes at once, instead of doing a byte-by-byte comparison to 0:

size_t strlen(const char *str)
{
    const char *p;
    const unsigned long *lp;

    /* Skip the first few bytes until we have an aligned p */
    for (p = str; (uintptr_t)p & LONGPTR_MASK; p++)
        if (*p == '\0')
            return (p - str);

    /* Scan the rest of the string using word sized operation */
    for (lp = (const unsigned long *)p; ; lp++)
        if ((*lp - mask01) & mask80) {
        p = (const char *)(lp);
        testbyte(0);
        testbyte(1);
        testbyte(2);
        testbyte(3);
#if (LONG_BIT >= 64)
        testbyte(4);
        testbyte(5);
        testbyte(6);
        testbyte(7);
#endif
    }

    /* NOTREACHED */
    return (0);
}

Now my question is: maybe I'm missing the obvious, but can't this read past the end of a string? What if we have a string of which the length is not divisible by the word size? Imagine the following scenario:

|<---------------- all your memories are belong to us --------------->|<-- not our memory -->
+-------------+-------------+-------------+-------------+-------------+ - -
|     'A'     |     'B'     |     'C'     |     'D'     |      0      |
+-------------+-------------+-------------+-------------+-------------+ - -
^                                                      ^^
|                                                      ||
+------------------------------------------------------++-------------- - -
                       long word #1                      long word #2

When the second long word is read, the program accesses bytes that it shouldn't in fact be accessing... isn't this wrong? I'm pretty confident that Apple and the BSD folks know what they are doing, so could someone please explain why this is correct?

One thing I've noticed is that beerboy asserted this to be undefined behavior, and I also believe it indeed is, but he was told that it isn't, because "we align to word size with the initial for loop" (not shown here). However, I don't see at all why alignment would be any relevant if the array is not long enough and we are reading past its end.

解决方案

Although this is technically undefined behavior, in practice no native architecture checks for out-of-bounds memory access at a finer granularity than the size of a word. So while garbage past the terminator may end up being read, the result will not be a crash.

这篇关于为什么这个实现的strlen()的工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆