更好的方式从内存加载向量。 (铛) [英] Better way to load vectors from memory. (clang)

查看:170
本文介绍了更好的方式从内存加载向量。 (铛)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个测试程序,用于使用Clang的OpenCL样式向量的语言扩展。我可以让代码工作,但我有问题,它的一个方面。我似乎不知道如何得到ang只是从标量数组中加载一个向量。

I'm writing a test program to get used to Clang's language extensions for OpenCL style vectors. I can get the code to work but I'm having issues getting one aspect of it down. I can't seem to figure out how to get clang to just load in a vector from a scalar array nicely.

现在我要做的事情:

byte16 va = (byte16){ argv[1][start], argv[1][start + 1], argv[1][start + 2], 
                      argv[1][start + 3], argv[1][start + 4], argv[1][start + 5], 
                      argv[1][start + 6], argv[1][start + 7], argv[1][start + 8],
                      argv[1][start + 9], argv[1][start + 10], argv[1][start + 11],
                      argv[1][start + 12], argv[1][start + 13], argv[1][start + 14],
                      argv[1][start + 15]};

我理想地喜欢这样:

byte16 va = *(byte16 *)(&(argv[1][start]));

我可以很容易地使用适当的内在函数为ARM或x86。但是该代码导致程序崩溃,虽然它编译。

Which I can easily do using the proper intrinsics for ARM or x86. But that code causes the program to crash although it compiles.

推荐答案

崩溃可能发生在x86上的原因之一是由于对齐问题。

One of the reasons the crash might occur on x86 is due to alignment issues. I do not have clang on my system to reproduce the problem, but I can demonstrate it at the example of GCC.

如果你做类似的操作:

/* Define a vector type of 16 characters.  */
typedef char __attribute__ ((vector_size (16))) byte16;

/* Global pointer.  */
char *  foo;

byte16 test ()
{
  return *(byte16 *)&foo[1];
}

现在,如果你在具有向量功能的x86上编译:

Now if you compile it on a vector-capable x86 with:

$  gcc -O3 -march=native -mtune=native   a.c

您将得到以下程序集用于测试:

You will get the following assembly for test:

test:
    movq foo(%rip), %rax
    vmovdqa 1(%rax), %xmm0
    ret

请注意,移动是对齐的,这当然是错误的。现在,如果你将这个函数内联到main中,你会有类似的:

Please note, that the move is aligned, which is of course wrong. Now, if you would inline this function into the main, and you will have something like:

int main ()
{
  foo = __builtin_malloc (22);
  byte16 x = *(byte16 *)&foo[1];
  return x[0];
}

你会很好,你会得到unaligned指令。这是一个错误,它在编译器中没有很好的修复,因为它将需要跨进程的优化,添加新的数据结构等。

You will be fine, and you will get unaligned instruction. This is kind of a bug, which doesn't have a very good fix in the compiler, as it would require interprocedural optimisations with addition of new data structures, etc.

问题的根源是编译器假定向量类型是对齐的,因此当取消引用对齐的向量类型数组时,可以使用对齐移动。作为GCC中的问题的解决方法,可以定义一个不对齐的向量类型,如:

The origin of the problem is that the compiler assumes that vector types are aligned, so when you dereference an array of aligned vector types you can use an aligned move. As a workaround for the problem in GCC one can define an unaligned vector type like:

typedef char __attribute__ ((vector_size (16),aligned (1))) unaligned_byte16;

并使用它来取消引用未对齐的内存。

And use it to dereference unaligned memory.

我不确定你是否击中了你的设置中的这个问题,但这是我建议通过检查从您的编译器的汇编输出检查。

I am not sure that you are hitting exactly this problem within your setup, but this is something that I would recommend to check by inspecting the assembly output from your compiler.

这篇关于更好的方式从内存加载向量。 (铛)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆