多少功能定位实际上很重要的现代处理器? [英] How much does function alignment actually matter on modern processors?

查看:169
本文介绍了多少功能定位实际上很重要的现代处理器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我编译C code与最近编译器的AMD64或x86系统,功能对齐到16字节的倍数。多少这种调整实际上很重要的现代处理器?是否与调用未对齐功能有关的巨大的性能损失?

When I compile C code with a recent compiler on an amd64 or x86 system, functions are aligned to a multiple of 16 bytes. How much does this alignment actually matter on modern processors? Is there a huge performance penalty associated with calling an unaligned function?

我跑到下面的微基准(<$ C C $> call.S ):

I ran the following microbenchmark (call.S):

// benchmarking performance penalty of function alignment.
#include <sys/syscall.h>

#ifndef SKIP
# error "SKIP undefined"
#endif

#define COUNT 1073741824

        .globl _start
        .type _start,@function
_start: mov $COUNT,%rcx
0:      call test
        dec %rcx
        jnz 0b
        mov $SYS_exit,%rax
        xor %edi,%edi
        syscall
        .size _start,.-_start

        .align 16
        .space SKIP
test:   nop
        rep
        ret
        .size test,.-test

与下面的脚本:

with the following shell script:

#!/bin/sh

for i in `seq 0 15` ; do
        echo SKIP=$i
        cc -c -DSKIP=$i call.S
        ld -o call call.o
        time -p ./call
done

在标识本身的英特尔(R)酷睿(TM)i7-2760QM CPU @ 2.40GHz的的根据的/ proc / cpuinfo的。偏移没有有所作为对我来说,基准采取了不断1.9秒运行。

On a CPU that identifies itself as Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz according to /proc/cpuinfo. The offset didn't make a difference for me, the benchmark took constant 1.9 seconds to run.

在另一方面,在另一个系统与该报告本身作为一个的英特尔(R)核心(商标)的i7的CPU大号640 @ 2.13GHz的的一个CPU,基准花费6.3秒,除非14或15,其中,code花费7.2秒的已一个偏移。我认为这是因为函数开始跨越多个高速缓存行。

On the other hand, on another system with a CPU that reports itself as a Intel(R) Core(TM) i7 CPU L 640 @ 2.13GHz, the benchmark takes 6.3 seconds, except if you have a offset of 14 or 15, where the code takes 7.2 seconds. I think that's because the function starts to span multiple cache lines.

推荐答案

TL; DR :缓存对齐的问题。你不想,你将不会被执行字节。

TL;DR: Cache alignment matters. You don't want bytes that you won't execute.

您会,至少要避免第一个将执行前取指令。由于这是一个微型基准,你最有可能看不出任何区别,但是想象一下,在一个完整的程序,如果你有一大堆的功能额外的缓存缺失,因为第一个字节不对齐到的cache线和你最终不得不取一个新的高速缓存线的函数的最后N个字节(其中N&所述; =字节数您缓存而没有使用该功能之前)

You would, at least, want to avoid fetching instructions before the first one you will execute. Since this is a micro-benchmark, you most likely don't see any difference, but imagine on a full program, if you have an extra cache-miss on a bunch of functions because the first byte wasn't aligned to a cache-line and you eventually had to fetch a new cache line for the last N bytes of the function (where N <= the number of bytes before the function that you cached but didn't use).

<一个href="http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf"相对=nofollow>英特尔优化手册这样说:

ç对齐的 3.4.1.5 $ C $

小心code布置可以提高高速缓存和内存局部性。基本块可能的序列,应该在内存中连续布局。这可能涉及删除不可能code,如code来处理错误条件,从序列。看到   优化指令prefetcher第3.7节,prefetching。

Careful arrangement of code can enhance cache and memory locality. Likely sequences of basic blocks should be laid out contiguously in memory. This may involve removing unlikely code, such as code to handle error conditions, from the sequence. See Section 3.7, "Prefetching," on optimizing the instruction prefetcher.

3-8Assembly/编译器编码规则12(M影响,H一般性)所有分支目标应该是16字节对齐。

3-8 Assembly/Compiler Coding Rule 12. (M impact, H generality) All branch targets should be 16- byte aligned.

大会/编译器编码规则13(M影响,H一般性)如果有条件的身体是不太可能被执行的,应当放置在程序的其他部分。如果这是极不可能被执行,code地区是一个问题,它应该被放置在不同的code页面上。

Assembly/Compiler Coding Rule 13. (M impact, H generality) If the body of a conditional is not likely to be executed, it should be placed in another part of the program. If it is highly unlikely to be executed and code locality is an issue, it should be placed on a different code page.

这也有助于解释为什么你没有注意到在你的程序有什么不同。所有code被缓存一次,永远不会离开(当然模上下文切换,)的缓存。

It also helps in explaining why you don't notice any difference in your program. All the code gets cached once and never leaves the cache (modulo context-switches, of course).

这篇关于多少功能定位实际上很重要的现代处理器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆