如何获得GCC创造体面code，检查是否缓冲区充满了NUL字节？ [英] How to get gcc to generate decent code that checks if a buffer is full of NUL bytes?

查看：115 发布时间：2016/8/23 11:02:45 c gcc x86 micro-optimization

本文介绍了如何获得GCC创造体面code，检查是否缓冲区充满了NUL字节？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我实施了解析磁带归档程序。解析器逻辑的一部分正在检查结束归档的标志是一个512字节块充满了NUL字节。我写了下面code用于此目的，希望GCC优化这口井：

I'm implementing a program that parses tape archives. Part of the parser logic is checking for an end-of-archive marker which is a 512-byte block full of NUL bytes. I wrote the following code for this purpose, expecting gcc to optimize this well:

int is_eof_block(const char usth[static 512])
{
    size_t i;

    for (i = 0; i < 512; i++)
        if (usth[i] != '\0')
            return 0;

    return 1;
}

不过，出乎我的意料，GCC仍然会产生可怕的code表示，即使我明确地允许它访问缓冲区中的整个512字节：

But to my surprise, gcc still generates terrible code for that, even though I explicitly allow it to access the whole 512 bytes in the buffer:

is_eof_block:
    leaq    512(%rdi), %rax
    jmp .L239
    .p2align 4,,10
.L243:
    addq    $1, %rdi
    cmpq    %rax, %rdi
    je  .L242
.L239:
    cmpb    $0, (%rdi)
    je  .L243
    xorl    %eax, %eax
    ret
    .p2align 4,,10
.L242:
    movl    $1, %eax
    ret

我预计gcc生成了这样的事情，甚至SIMD code：

I expected gcc to generate something like this or even SIMD code:

is_eof_block:
    mov $64,%ecx
    xor %eax,%eax
    repz scasq
    setz %al
    ret

我

如何改写code，使得它仍然是可移植的（如：不使用非C99语言扩展，并适用于不支持未对齐内存存取架构），但是编译成更好的机器code。关于常见的体系，如AMD64和AArch32？

How can I rewrite the code such that it is still portable (as in: does not use non-C99 language extensions and works on architectures that do not support misaligned memory access) but compiles to better machine code on common architectures such as amd64 and AArch32?

我写了下面的微基准来证明的时间差。您可以定义未对齐为正整数与错位缓冲区进行测试。

I wrote the following microbenchmark to demonstrate the time difference. You can define MISALIGNED to a positive integer to test with misaligned buffers.

#include <stdio.h>
#include <time.h>

#define TESTS 10000000
#ifndef MISALIGNED
# define MISALIGNED 0
#endif

char testarray[512 + MISALIGNED];

extern int is_eof_block(const char[static 512]);

int main()
{
    size_t i, j;
    clock_t begin, end;

    fprintf(stderr, "testing %d times\n", TESTS);
    fprintf(stderr, "no byte set to 1... ");
    begin = clock();

    for (i = 0; i < TESTS; i++)
        if (!is_eof_block(testarray + MISALIGNED)) {
            fprintf(stderr, "\nWrong test result in iteration %zu!\n", i);
            return EXIT_FAILURE;
        }

    end = clock();
    fprintf(stderr, "%fs\n", (end - begin) / (double)CLOCKS_PER_SEC);

    fprintf(stderr, "with non-null byte... ");
    begin = clock();

    for (i = j = 0; i < TESTS; i++) {
        testarray[MISALIGNED + j] = '\0';
        j = (j + 47) & 511;
        testarray[MISALIGNED + j] = '1';

        if (is_eof_block(testarray + MISALIGNED)) {
            fprintf(stderr, "\nWrong test result in iteration %zu!\n", i);
            return EXIT_FAILURE;
        }       
    }

    end = clock();
    fprintf(stderr, "%fs\n", (end - begin) / (double)CLOCKS_PER_SEC);

    return EXIT_SUCCESS;
}

is_eof_block_c.c

#include <stddef.h>

int is_eof_block(const char test[static 512])
{
    size_t i;

    for (i = 0; i < 512; i++)
        if (test[i] != '\0')
            return 0;

    return 1;
}

is_eof_block_asm.s

    .text
    .globl is_eof_block
    .type is_eof_block,@function

    .align 16
is_eof_block:
    mov $64,%ecx
    xor %eax,%eax
    repz scasq
    setz %al
    ret
    .size is_eof_block,.-is_eof_block

下面是用C语言实现 is_eof_block 的链接的输出：

Here is the output with the C implementation of is_eof_block linked in:

testing 10000000 times
no byte set to 1... 2.281250s
with non-null byte... 1.195312s

和这里是集版本：

testing 10000000 times
no byte set to 1... 0.476562s
with non-null byte... 0.320312s

两人都被编译一个gcc 5与唯一的优化选项是 -O3 。通过各种 -march = ... 标记并没有改变code。的差大约是四的因子。随着对齐缓冲区，装配执行慢大约3％，而没有与C实现的没有什么区别。

Both have been compiled with a gcc 5 with the sole optimization option being -O3. Passing various -march=... flags didn't change the code. The difference is about a factor of four. With a misaligned buffer, the assembly implementation is roughly 3% slower whereas there is no difference with the C implementation.

如何获得GCC创造体面code，检查是否缓冲区充满了NUL字节？ [英] How to get gcc to generate decent code that checks if a buffer is full of NUL bytes?

问题描述

is_eof_block_c.c

is_eof_block_asm.s

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

如何获得GCC创造体面code，检查是否缓冲区充满了NUL字节？ [英] How to get gcc to generate decent code that checks if a buffer is full of NUL bytes?

问题描述

is_eof_block_c.c

is_eof_block_asm.s

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭