向量的LLVM的AMD64输出对齐 [英] Alignment of vectors in LLVM's amd64 output
问题描述
我试图用向量内部结构与LLVM。我有我的结构如下C定义的:
I'm trying to use vectors inside structs with LLVM. I have the following C definition of my struct:
struct Foo
{
uint32_t len;
uint32_t data[32] __attribute__ ((aligned (16)));
};
和这里的一些LLVM code增加42到数据的元素3号
字段:
and here's some LLVM code to add 42 to element number 3 of the data
field:
%Foo = type { i32, <32 x i32> }
define void @process(%Foo*) {
_L1:
%data = getelementptr %Foo* %0, i32 0, i32 1
%vec = load <32 x i32>* %data
%x = extractelement <32 x i32> %vec, i32 3
%xNew = add i32 42, %x
%vecNew = insertelement <32 x i32> %vec, i32 %xNew, i32 3
store <32 x i32> %vecNew, <32 x i32>* %data
ret void
}
然而,LLC的输出是因为如果载体具有在128个字节,这似乎浪费,也错误要对齐(AFAIK矢量应为16字节对齐):
However, the output of llc is as if vectors had to be aligned at 128 bytes, which seems wasteful, and also wrong (AFAIK vectors should be 16-byte-aligned):
.file "process.bc"
.text
.globl process
.align 16, 0x90
.type process,@function
process: # @process
.Leh_func_begin0:
# BB#0: # %_L1
movdqa 128(%rdi), %xmm0
pextrd $3, %xmm0, %eax
addl $42, %eax
pinsrd $3, %eax, %xmm0
movdqa %xmm0, 128(%rdi)
ret
.Ltmp0:
.size process, .Ltmp0-process
.Leh_func_end0:
当然,如果我改变C定义也对齐数据字段为128个字节,它的工作原理,但(如果使用16字节对齐比12)浪费了124个字节似乎只是错误的。所以,这是怎么回事吗?
Of course, if I change the C definition to also align the data field at 128 bytes, it works, but wasting 124 bytes (compared to 12 if using 16-byte alignment) just seems wrong. So what's going on here?
推荐答案
我觉得你GEPS是有点过了最佳codeGEN。下面是一些C code,做类似的事情:
I think your GEPs are a little off for the best codegen. Here's some C code that does something similar:
#include <stdint.h>
struct Foo
{
uint32_t len;
uint32_t data[32] __attribute__ ((aligned (16)));
};
void foo(struct Foo *F)
{
F->data[3] = 4;
}
这铿锵变成这样:
which clang turns into this:
; ModuleID = 'foo.c'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-apple-darwin10.0.0"
%struct.Foo = type { i32, [12 x i8], [32 x i32] }
define void @foo(%struct.Foo* %F) nounwind ssp {
%1 = alloca %struct.Foo*, align 8
store %struct.Foo* %F, %struct.Foo** %1, align 8
%2 = load %struct.Foo** %1, align 8
%3 = getelementptr inbounds %struct.Foo* %2, i32 0, i32 2
%4 = getelementptr inbounds [32 x i32]* %3, i32 0, i64 3
store i32 4, i32* %4
ret void
}
和相应的好的code你所期望的:
and the corresponding nice code you'd expect:
_foo: ## @foo
Leh_func_begin0:
## BB#0:
pushq %rbp
Ltmp0:
movq %rsp, %rbp
Ltmp1:
movl $4, 28(%rdi)
popq %rbp
ret
Leh_func_end0:
这就是说,code,你有没有不正确的,应该是:
That said, the code you have there is isn't right and should be:
_process: ## @process
Leh_func_begin1:
## BB#0: ## %_L1
movaps 16(%rdi), %xmm0
pextrd $3, %xmm0, %eax
addl $42, %eax
pinsrd $3, %eax, %xmm0
movaps %xmm0, 16(%rdi)
ret
和更糟糕的是在ToT在这样一个bug报告还是不错那里。
and is even worse in ToT so a bug report wouldn't go amiss there.
这篇关于向量的LLVM的AMD64输出对齐的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!