向量化的strlen摆脱了读取未分配的内存 [英] vectorized strlen getting away with reading unallocated memory
问题描述
在研究OSX 10.9.4的 strlen 的实现时,注意,它总是比较一个16字节的块,并跳到后面的16字节,直到遇到'\0
'.相关部分:
While studying OSX 10.9.4's implementation of strlen, I notice that it always compares a chunk of 16-bytes and skips ahead to the following 16-bytes until it encounters a '\0
'. The relevant part:
3de0: 48 83 c7 10 add $0x10,%rdi
3de4: 66 0f ef c0 pxor %xmm0,%xmm0
3de8: 66 0f 74 07 pcmpeqb (%rdi),%xmm0
3dec: 66 0f d7 f0 pmovmskb %xmm0,%esi
3df0: 85 f6 test %esi,%esi
3df2: 74 ec je 3de0 <__platform_strlen+0x40>
0x10
是16字节的十六进制.
0x10
is 16 bytes in hex.
当我看到这一点时,我在想:该内存也不能被分配.如果我分配了一个20字节的C字符串并将其传递给strlen
,它将读取36字节的内存.为什么允许这样做?我开始寻找并发现这有多危险越界访问数组?
When I saw that, I was wondering: this memory could just as well not be allocated. If I had allocated a C string of 20 bytes and passed it to strlen
, it would read 36 bytes of memory. Why is it allowed to do that? I started looking and found How dangerous is it to access an array out of bounds?
这肯定并非总是一件好事,例如,未分配的内存可能未映射.但是,必须有一些东西可以使这项工作奏效.我的一些假设:
Which confirmed that it's definitely not always a good thing, unallocated memory might be unmapped, for example. Yet, there must be something that makes this work. Some of my hypotheses:
-
OSX不仅保证其分配是16字节对齐的,而且还保证分配的量子"是16字节的块.换句话说,分配5个字节实际上将分配16个字节.分配20个字节实际上将分配32个字节.
- 在编写asm时读取数组的末尾本身不是有害的,因为它不是未定义的行为,只要它在范围内(在页面内?)即可.
- OSX not only guarantees that its allocations are 16-byte aligned, but also that the "quantum" of an allocated is a 16-byte chunks. Said another way, allocating 5 bytes will actually allocate 16 bytes. Allocating 20 bytes will actually allocate 32 bytes.
- It's not harmful per se to read of the end of an array when you're writing asm, as it's not undefined behaviour, as long as its within bounds (within a page?).
真正的原因是什么?
EDIT: just found Why I'm getting read and write permission on unallocated memory?, which seems to indicate my first guess was right.
编辑2 :愚蠢的是,我已经忘记了,即使苹果似乎已经删除了大多数asm实现的来源( http://www.opensource.apple.com/source/Libc /Libc-997.90.3/x86_64/string/strlen.s
EDIT 2: Stupidly enough, I had forgotten that even though Apple seems to have removed the source of most of its asm implementations (Where did OSX's x86-64 assembly libc routines go?), it left strlen: http://www.opensource.apple.com/source/Libc/Libc-997.90.3/x86_64/string/strlen.s
在评论中,我们发现:
// returns the length of the string s (i.e. the distance in bytes from
// s to the first NUL byte following s). We look for NUL bytes using
// pcmpeqb on 16-byte aligned blocks. Although this may read past the
// end of the string, because all access is aligned, it will never
// read past the end of the string across a page boundary, or even
// accross a cacheline.
编辑:老实说,我认为所有回答者都应该接受一个可接受的答案,并且基本上所有回答者都包含理解问题的必要信息.因此,我去寻找声誉最低的人的答案.
EDIT: I honestly think all answerers deserved an accepted answer, and basically all contained the information necessary to understand the issue. So I went for the answer of the person that had the least reputation.
推荐答案
如果要读取的地址对应于未映射的页面,则在大多数体系结构上读取内存只会产生副作用.大多数现代计算机的strlen
实现都尝试仅 aligned 读取很多字节.他们永远不会跨越两个页面进行16字节的读取,因此永远不会引起任何副作用.太酷了.
Reading memory on most architectures only has a side effect if the address being read corresponds to a page that is not mapped. Most strlen
implementations for modern computers try to do only aligned reads of however-many bytes. They will never do a 16-byte read straddling two pages, and so they will never elicit any side effect. So it's cool.
这篇关于向量化的strlen摆脱了读取未分配的内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!