为什么复杂的 memcpy/memset 更胜一筹? [英] Why are complicated memcpy/memset superior?

查看:31
本文介绍了为什么复杂的 memcpy/memset 更胜一筹?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

调试的时候,经常会踩到memcpy和memset的手写汇编实现.这些通常使用流指令(如果可用)、循环展开、对齐优化等来实现......我最近也遇到了这个 'bug' 是由于 glibc 中的 memcpy 优化.

When debugging, I frequently stepped into the handwritten assembly implementation of memcpy and memset. These are usually implemented using streaming instructions if available, loop unrolled, alignment optimized, etc... I also recently encountered this 'bug' due to memcpy optimization in glibc.

问题是:为什么硬件厂商(Intel、AMD)不能针对具体情况进行优化

The question is: why can't the hardware manufacturers (Intel, AMD) optimize the specific case of

rep stos

rep movs

被认可,并在他们自己的架构上尽可能最快填充和复制?

to be recognized as such, and do the fastest fill and copy as possible on their own architecture?

推荐答案

我想添加到其他答案中的一件事是 rep movs 在所有现代处理器上实际上并不慢.例如,

One thing I'd like to add to the other answers is that rep movs is not actually slow on all modern processors. For instance,

通常,REP MOVS 指令有很大的开销来选择并设置正确的方法.因此,它不是最佳的小数据块.对于大块数据,可能相当当满足对齐等的某些条件时有效.这些条件取决于特定的 CPU(参见第 143 页).关于英特尔 Nehalem和 Sandy Bridge 处理器,这是移动速度最快的方法大块数据,即使数据未对齐.

Usually, the REP MOVS instruction has a large overhead for choosing and setting up the right method. Therefore, it is not optimal for small blocks of data. For large blocks of data, it may be quite efficient when certain conditions for alignment etc. are met. These conditions depend on the specific CPU (see page 143). On Intel Nehalem and Sandy Bridge processors, this is the fastest method for moving large blocks of data, even if the data are unaligned.

[突出显示是我的.] 参考:Agner Fog, Optimizing subroutines in assembly语x86 平台的优化指南.,第156(另见第 16.10 节,第 143 页)[2011-06-08 版本].

[Highlighting is mine.] Reference: Agner Fog, Optimizing subroutines in assembly language An optimization guide for x86 platforms. ,p. 156 (and see also section 16.10, p. 143) [version of 2011-06-08].

这篇关于为什么复杂的 memcpy/memset 更胜一筹?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆