向量化的strlen摆脱了读取未分配的内存 [英] vectorized strlen getting away with reading unallocated memory

查看:107
本文介绍了向量化的strlen摆脱了读取未分配的内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在研究OSX 10.9.4的 strlen 的实现时,注意,它总是比较一个16字节的块,并跳到后面的16字节,直到遇到'\0'.相关部分:

While studying OSX 10.9.4's implementation of strlen, I notice that it always compares a chunk of 16-bytes and skips ahead to the following 16-bytes until it encounters a '\0'. The relevant part:

3de0:   48 83 c7 10             add    $0x10,%rdi
3de4:   66 0f ef c0             pxor   %xmm0,%xmm0
3de8:   66 0f 74 07             pcmpeqb (%rdi),%xmm0
3dec:   66 0f d7 f0             pmovmskb %xmm0,%esi
3df0:   85 f6                   test   %esi,%esi
3df2:   74 ec                   je     3de0 <__platform_strlen+0x40>

0x10是16字节的十六进制.

0x10 is 16 bytes in hex.

当我看到这一点时,我在想:该内存也不能被分配.如果我分配了一个20字节的C字符串并将其传递给strlen,它将读取36字节的内存.为什么允许这样做?我开始寻找并发现这有多危险越界访问数组?

When I saw that, I was wondering: this memory could just as well not be allocated. If I had allocated a C string of 20 bytes and passed it to strlen, it would read 36 bytes of memory. Why is it allowed to do that? I started looking and found How dangerous is it to access an array out of bounds?

这肯定并非总是一件好事,例如,未分配的内存可能未映射.但是,必须有一些东西可以使这项工作奏效.我的一些假设:

Which confirmed that it's definitely not always a good thing, unallocated memory might be unmapped, for example. Yet, there must be something that makes this work. Some of my hypotheses:

    OSX不仅保证其分配是16字节对齐的,而且还保证分配的量子"是16字节的块.换句话说,分配5个字节实际上将分配16个字节.分配20个字节实际上将分配32个字节.
  • 在编写asm时读取数组的末尾本身不是有害的,因为它不是未定义的行为,只要它在范围内(在页面内?)即可.
  • OSX not only guarantees that its allocations are 16-byte aligned, but also that the "quantum" of an allocated is a 16-byte chunks. Said another way, allocating 5 bytes will actually allocate 16 bytes. Allocating 20 bytes will actually allocate 32 bytes.
  • It's not harmful per se to read of the end of an array when you're writing asm, as it's not undefined behaviour, as long as its within bounds (within a page?).

真正的原因是什么?

编辑:刚刚找到

EDIT: just found Why I'm getting read and write permission on unallocated memory?, which seems to indicate my first guess was right.

编辑2 :愚蠢的是,我已经忘记了,即使苹果似乎已经删除了大多数asm实现的来源(

EDIT 2: Stupidly enough, I had forgotten that even though Apple seems to have removed the source of most of its asm implementations (Where did OSX's x86-64 assembly libc routines go?), it left strlen: http://www.opensource.apple.com/source/Libc/Libc-997.90.3/x86_64/string/strlen.s

在评论中,我们发现:

//  returns the length of the string s (i.e. the distance in bytes from
//  s to the first NUL byte following s).  We look for NUL bytes using
//  pcmpeqb on 16-byte aligned blocks.  Although this may read past the
//  end of the string, because all access is aligned, it will never
//  read past the end of the string across a page boundary, or even
//  accross a cacheline.

编辑:老实说,我认为所有回答者都应该接受一个可接受的答案,并且基本上所有回答者都包含理解问题的必要信息.因此,我去寻找声誉最低的人的答案.

EDIT: I honestly think all answerers deserved an accepted answer, and basically all contained the information necessary to understand the issue. So I went for the answer of the person that had the least reputation.

推荐答案

如果要读取的地址对应于未映射的页面,则在大多数体系结构上读取内存只会产生副作用.大多数现代计算机的strlen实现都尝试仅 aligned 读取很多字节.他们永远不会跨越两个页面进行16字节的读取,因此永远不会引起任何副作用.太酷了.

Reading memory on most architectures only has a side effect if the address being read corresponds to a page that is not mapped. Most strlen implementations for modern computers try to do only aligned reads of however-many bytes. They will never do a 16-byte read straddling two pages, and so they will never elicit any side effect. So it's cool.

这篇关于向量化的strlen摆脱了读取未分配的内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆