如何阻止GCC将此逐字节复制优化为memcpy调用? [英] How do I stop GCC from optimizing this byte-for-byte copy into a memcpy call?
问题描述
我有用于 memcpy
的代码,这是我实现的标准C库的一部分,该库从 src复制内存
到目的地
一次一个字节:
void * memcpy(void *限制dest,const void *限制src,size_t len)
{
char * dp =(char * restrict)dest;
const char * sp =(const char * restrict)src;
while(len--)
{
* dp ++ = * sp ++;
}
返回目的地;
}
使用 gcc -O2
,生成的代码是合理的:
memcpy:
.LFB0:
movq%rdi,% rax
testq%rdx,%rdx
je .L2
xorl%ecx,%ecx
.L3:
movzbl(%rsi,%rcx),%r8d
movb%r8b,(%rax,%rcx)
addq $ 1,%rcx
cmpq%rdx,%rcx
jne .L3
.L2:
ret
.LFE0:
但是,在 gcc- O3
,GCC将该天真的逐字节副本优化为 memcpy
调用:
memcpy:
.LFB0:
testq%rdx,%rdx
je .L7
subq $ 8,%rsp
呼叫memcpy
addq $ 8,%rsp
ret
.L7:
movq%rdi,%rax
ret
.LFE0:
这将不起作用( memcpy
无条件调用
我尝试通过 -fno-builtin-memcpy
和 -fno-loop-optimizations
,并且发生相同的事情。
我正在使用GCC版本8.3.0 :
使用内置规格。
COLLECT_GCC = gcc
COLLECT_LTO_WRAPPER = / usr / local / libexec / gcc / x86_64-cros-linux-gnu / 8.3.0 / lto-wrapper
目标:x86_64-cros-linux-gnu
配置为:../configure --prefix = / usr / local --libdir = / usr / local / lib64 --build = x86_64-cros-linux-gnu --host = x86_64-cros-linux- gnu --target = x86_64-cros-linux-gnu --enable-checking = release --disable-multilib --enable-threads = posix --disable-bootstrap --disable-werror --disable-libmpx --enable-静态--enable-shared --program-suffix = -8.3.0 --with-arch-64 = x86-64
线程模型:posix
gcc版本8.3.0(GCC)
如何禁用使副本转换为 memcpy的优化
调用?
在这里似乎已经足够了:代替使用 -fno-builtin-memcpy
使用 -fno-builtin
r仅编译 memcpy
的翻译单位!
另一种方法是通过- fno-tree-loop-distribute-patterns
;尽管这样做可能很脆弱,因为它禁止编译器先重新组织循环代码,然后再 调用对 mem *
函数的调用来替换其中的一部分。 / p>
或者,由于您不能依赖C库中的任何内容,因此也许可以顺便使用 -ffreestanding
。
I have this code for memcpy
as part of my implementation of the standard C library which copies memory from src
to dest
one byte at a time:
void *memcpy(void *restrict dest, const void *restrict src, size_t len)
{
char *dp = (char *restrict)dest;
const char *sp = (const char *restrict)src;
while( len-- )
{
*dp++ = *sp++;
}
return dest;
}
With gcc -O2
, the code generated is reasonable:
memcpy:
.LFB0:
movq %rdi, %rax
testq %rdx, %rdx
je .L2
xorl %ecx, %ecx
.L3:
movzbl (%rsi,%rcx), %r8d
movb %r8b, (%rax,%rcx)
addq $1, %rcx
cmpq %rdx, %rcx
jne .L3
.L2:
ret
.LFE0:
However, at gcc -O3
, GCC optimizes this naive byte-for-byte copy into a memcpy
call:
memcpy:
.LFB0:
testq %rdx, %rdx
je .L7
subq $8, %rsp
call memcpy
addq $8, %rsp
ret
.L7:
movq %rdi, %rax
ret
.LFE0:
This won't work (memcpy
unconditionally calls itself), and it causes a segfault.
I've tried passing -fno-builtin-memcpy
and -fno-loop-optimizations
, and the same thing occurs.
I'm using GCC version 8.3.0:
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-cros-linux-gnu/8.3.0/lto-wrapper
Target: x86_64-cros-linux-gnu
Configured with: ../configure --prefix=/usr/local --libdir=/usr/local/lib64 --build=x86_64-cros-linux-gnu --host=x86_64-cros-linux-gnu --target=x86_64-cros-linux-gnu --enable-checking=release --disable-multilib --enable-threads=posix --disable-bootstrap --disable-werror --disable-libmpx --enable-static --enable-shared --program-suffix=-8.3.0 --with-arch-64=x86-64
Thread model: posix
gcc version 8.3.0 (GCC)
How do I disable the optimization that causes the copy to be transformed into a memcpy
call?
One thing that seems to be sufficient here: instead of using -fno-builtin-memcpy
use -fno-builtin
for compiling the translation unit of memcpy
alone!
An alternative would be to pass -fno-tree-loop-distribute-patterns
; though this might be brittle as it forbids the compiler from reorganizing the loop code first and then replacing part of them with calls to mem*
functions.
Or, since you cannot rely anything in the C library, perhaps using -ffreestanding
could be in order.
这篇关于如何阻止GCC将此逐字节复制优化为memcpy调用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!