编译失败,在Mac OS X Lion中的OpenMP(memcpy和上证所内部函数) [英] Compilation fails with OpenMP on Mac OS X Lion (memcpy and SSE intrinsics)

查看:283
本文介绍了编译失败,在Mac OS X Lion中的OpenMP(memcpy和上证所内部函数)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在下面的问题绊倒了。下面code片段无法在Mac OS X与任何X $ C $三我试过(4.4,4.5)

链接

 的#include<&stdlib.h中GT;
#包括LT&;&string.h中GT;
#包括LT&;&emmintrin.h GT;INT主(INT ARGC,CHAR *的argv [])
{
  字符*温度;
OMP的#pragma并行
  {
    __m128d V_A,v_ar;
    的memcpy(温度,argv的[0],10);
    v_ar = _mm_shuffle_pd(V_A,V_A,_MM_SHUFFLE2(0,1));
  }
}

在code只是提供作为一个例子,当你运行它会出现段错误。问题的关键是,它并没有编译。编译使用以下行完成

  /Applications/X$c$c.app/Contents/Developer/usr/bin/gcc test.c的-arch x86_64的-isysroot /Applications/X$c$c.app /Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.7.sdk -mmacosx版本分钟= 10.7 -fopenmp 适用于建筑x86_64的未定义符号:
___builtin_ia32_shufpd,从引用:
    _main.omp_fn.0在ccJM7RAw.o
___builtin_object_size,从引用:
    _main.omp_fn.0在ccJM7RAw.o
LD:符号(S)未找到x86_64的架构
collect2:劳工处返回1退出状态

在code编译就好了当的不可以使用 -fopenmp 标志 GCC 。现在,我用Google搜索了一圈,发现与的memcpy 连接的第一个问题,这是加入 -fno-内置解决方案或 -D_FORTIFY_SOURCE = 0 GCC 参数列表。我没能解决第二个问题(SSE内在)。

谁能帮我解决呢?问题:


  • 最重要的:如何摆脱___builtin_ia32_shufpd错误的

  • 究竟是的memcpy 问题的原因,又是什么在 -D_FORTIFY_SOURCE = 0 标志最终做?


解决方案

这是在路上一个bug苹果的LLVM支持海湾合作委员会( LLVM-GCC )转换OpenMP区域并处理里面他们的内置插件调用。这个问题可以由经过检查中间树转储(获得诊断 -fdump树,所有参数 GCC )。如果没有启用的OpenMP生成以下最后code再presentation(从 test.c.016t.fap

主(ARGC,ARGV)
{
  D.6544 = __builtin_object_size(温度,0);
  D.6545 = __builtin_object_size(温度,0);
  D.6547 = __builtin___memcpy_chk(温度,D.6546,10,D.6545);
  D.6550 = __builtin_ia32_shufpd(V_A,V_A,1);
}

这是编译器的所有转换之后如何看待code内部类似C语言的重新presentation。这就是然后获取变成汇编指令。 (只有那些参考内建线此处示出)

通过启用的OpenMP并行区域被提取到自己的函数, main.omp_fn.0

main.omp_fn.0(.omp_data_i)
{
  无效*(*< T4f​​6>)(无效*,常量<无名型> *,长期无符号整型,长无符号整数)__builtin ___ memcpy_chk.21;
  长unsigned int类型(*< T4f​​5>)(const的<无名型> *,INT)__builtin_object_size.20;
  矢量双(* LT; T6b5>)(矢量双,双矢量,INT)__builtin_ia32_shufpd.23;
  长unsigned int类型(*< T4f​​5>)(const的<无名型> *,INT)__builtin_object_size.19;  __builtin_object_size.19 = __builtin_object_size;
  D.6587 = __builtin_object_size.19(D.6603,0);
  __builtin_ia32_shufpd.23 = __builtin_ia32_shufpd;
  D.6593 = __builtin_ia32_shufpd.23(V_A,V_A,1);
  __builtin_object_size.20 = __builtin_object_size;
  D.6588 = __builtin_object_size.20(D.6605,0);
  __builtin ___ memcpy_chk.21 = __builtin___memcpy_chk;
  D.6590 = ___ __builtin memcpy_chk.21(D.6609,D.6589,10,D.6588);
}

同样,我只离开了code,它指的是建宏。什么是显而易见的(但其原因是没有立即对我明显)是OpenMP的code trasnformer真的在通过调用函数指针所有的内置插件坚称。这些指针asignments:

__ builtin_object_size.19 = __builtin_object_size;
__builtin_ia32_shufpd.23 = __builtin_ia32_shufpd;
__builtin_object_size.20 = __builtin_object_size;
__builtin ___ memcpy_chk.21 = __builtin___memcpy_chk;

生成到这是不是真的那么让编译器特别处理符号,而是名称符号的外部引用。链接器然后尝试解决这些问题,但无法找到任何在任何的code为对链接的目标文件的内置__ _ * 名称。这也是在组装code,人们可以通过传递 -S GCC 获得可观察到的:

LBB2_1:
    MOVAPD -48(RBP%),%XMM0
    MOVL $ 1,%eax中
    MOVAPS%XMM0,-80(RBP%)
    MOVAPS -80(RBP%),%将xmm1
    MOVL%EAX,EDI%
    callq ___builtin_ia32_shufpd
    MOVAPD%XMM0,-32(RBP%)

这基本上是一个函数调用了3个参数:在%EAX 一个整型和两个XMM论点%XMM0 %将xmm1 ,并在%XMM0 (按照SysV的AMD64 ABI函数调用约定)。相反,没有 -fopenmp 产生的code是固有的指令级扩展,因为它是应该发生的:

LBB1_3:
    MOVAPD -64(RBP%),%XMM0
    shufpd $ 1,%XMM0,%XMM0
    MOVAPD%XMM0,-80(RBP%)

什么,当你发生过 -D_FORTIFY_SOURCE = 0 的memcpy 不会被强化所取代检查版本和的memcpy 常规呼叫代替。这消除了引用 object_size __ memcpy_chk ,但不能去除调用 ia32_shufpd 内置

这显然是一个编译器错误。如果你真的真的真的必须使用苹果的GCC编译code,那么一个临时的解决办法是违规code移动到外部功能的bug显然只影响会从提取code 平行地区:

无效FUNC(字符*温度,字符* argv0)
{
   __m128d V_A,v_ar;
   的memcpy(温度,argv0,10);
   v_ar = _mm_shuffle_pd(V_A,V_A,_MM_SHUFFLE2(0,1));
}INT主(INT ARGC,CHAR *的argv [])
{
  字符*温度;
OMP的#pragma并行
  {
    FUNC(温度,argv的[0]);
  }
}

一个额外的函数调用的开销相比,进入和退出平行区域的开销neglegible。您可以使用在 OpenMP编译FUNC - 他们会因为平行区域的动态范围界定的工作。

可能是苹果将在未来提供一个固定的编译器,也许他们不会,因为它们与锵取代GCC的承诺。

I have stumbled upon the following problem. The below code snippet does not link on Mac OS X with any Xcode I tried (4.4, 4.5)

#include <stdlib.h>
#include <string.h>
#include <emmintrin.h>

int main(int argc, char *argv[])
{
  char *temp;
#pragma omp parallel
  {
    __m128d v_a, v_ar;
    memcpy(temp, argv[0], 10);
    v_ar = _mm_shuffle_pd(v_a, v_a, _MM_SHUFFLE2 (0,1));
  }
}

The code is just provided as an example and would segfault when you run it. The point is that it does not compile. The compilation is done using the following line

/Applications/Xcode.app/Contents/Developer/usr/bin/gcc test.c -arch x86_64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.7.sdk -mmacosx-version-min=10.7 -fopenmp

 Undefined symbols for architecture x86_64:
"___builtin_ia32_shufpd", referenced from:
    _main.omp_fn.0 in ccJM7RAw.o
"___builtin_object_size", referenced from:
    _main.omp_fn.0 in ccJM7RAw.o
ld: symbol(s) not found for architecture x86_64
collect2: ld returned 1 exit status

The code compiles just fine when not using the -fopenmp flag to gcc. Now, I googled around and found a solution for the first problem connected with memcpy, which is adding -fno-builtin, or -D_FORTIFY_SOURCE=0 to gcc arguments list. I did not manage to solve the second problem (sse intrinsic).

Can anyone help me to solve this? The questions:

  • most importantly: how to get rid of the "___builtin_ia32_shufpd" error?
  • what exactly is the reason for the memcpy problem, and what does the -D_FORTIFY_SOURCE=0 flag eventually do?

解决方案

This is a bug in the way Apple's LLVM-backed GCC (llvm-gcc) transforms OpenMP regions and handles calls to the built-ins inside them. The problem can be diagnosed by examining the intermediate tree dumps (obtainable by passing -fdump-tree-all argument to gcc). Without OpenMP enabled the following final code representation is generated (from the test.c.016t.fap):

main (argc, argv)
{
  D.6544 = __builtin_object_size (temp, 0);
  D.6545 = __builtin_object_size (temp, 0);
  D.6547 = __builtin___memcpy_chk (temp, D.6546, 10, D.6545);
  D.6550 = __builtin_ia32_shufpd (v_a, v_a, 1);
}

This is a C-like representation of how the compiler sees the code internally after all transformations. This is what is then gets turned into assembly instructions. (only those lines that refer to the built-ins are shown here)

With OpenMP enabled the parallel region is extracted into own function, main.omp_fn.0:

main.omp_fn.0 (.omp_data_i)
{
  void * (*<T4f6>) (void *, const <unnamed type> *, long unsigned int, long unsigned int) __builtin___memcpy_chk.21;
  long unsigned int (*<T4f5>) (const <unnamed type> *, int) __builtin_object_size.20;
  vector double (*<T6b5>) (vector double, vector double, int) __builtin_ia32_shufpd.23;
  long unsigned int (*<T4f5>) (const <unnamed type> *, int) __builtin_object_size.19;

  __builtin_object_size.19 = __builtin_object_size;
  D.6587 = __builtin_object_size.19 (D.6603, 0);
  __builtin_ia32_shufpd.23 = __builtin_ia32_shufpd;
  D.6593 = __builtin_ia32_shufpd.23 (v_a, v_a, 1);
  __builtin_object_size.20 = __builtin_object_size;
  D.6588 = __builtin_object_size.20 (D.6605, 0);
  __builtin___memcpy_chk.21 = __builtin___memcpy_chk;
  D.6590 = __builtin___memcpy_chk.21 (D.6609, D.6589, 10, D.6588);
}

Again I have only left the code that refers to the builtins. What is apparent (but the reason for that is not immediately apparent to me) is that the OpenMP code trasnformer really insists on calling all the built-ins through function pointers. These pointer asignments:

__builtin_object_size.19 = __builtin_object_size;
__builtin_ia32_shufpd.23 = __builtin_ia32_shufpd;
__builtin_object_size.20 = __builtin_object_size;
__builtin___memcpy_chk.21 = __builtin___memcpy_chk;

generate external references to symbols which are not really symbols but rather names that get special treatment by the compiler. The linker then tries to resolve them but is unable to find any of the __builtin_* names in any of the object files that the code is linked against. This is also observable in the assembly code that one can obtain by passing -S to gcc:

LBB2_1:
    movapd  -48(%rbp), %xmm0
    movl    $1, %eax
    movaps  %xmm0, -80(%rbp)
    movaps  -80(%rbp), %xmm1
    movl    %eax, %edi
    callq   ___builtin_ia32_shufpd
    movapd  %xmm0, -32(%rbp)

This basically is a function call that takes 3 arguments: one integer in %eax and two XMM arguments in %xmm0 and %xmm1, with the result being returned in %xmm0 (as per the SysV AMD64 ABI function calling convention). In contrast, the code generated without -fopenmp is an instruction-level expansion of the intrinsic as it is supposed to happen:

LBB1_3:
    movapd  -64(%rbp), %xmm0
    shufpd  $1, %xmm0, %xmm0
    movapd  %xmm0, -80(%rbp)

What happens when you pass -D_FORTIFY_SOURCE=0 is that memcpy is not replaced by the "fortified" checking version and a regular call to memcpy is used instead. This eliminates the references to object_size and __memcpy_chk but cannot remove the call to the ia32_shufpd built-in.

This is obviously a compiler bug. If you really really really must use Apple's GCC to compile the code, then an interim solution would be to move the offending code to an external function as the bug apparently only affects code that gets extracted from parallel regions:

void func(char *temp, char *argv0)
{
   __m128d v_a, v_ar;
   memcpy(temp, argv0, 10);
   v_ar = _mm_shuffle_pd(v_a, v_a, _MM_SHUFFLE2 (0,1));
}

int main(int argc, char *argv[])
{
  char *temp;
#pragma omp parallel
  {
    func(temp, argv[0]);
  }
}

The overhead of one additional function call is neglegible compared to the overhead of entering and exiting the parallel region. You can use OpenMP pragmas inside func - they will work because of the dynamic scoping of the parallel region.

May be Apple would provide a fixed compiler in the future, may they won't, given their commitment to replacing GCC with Clang.

这篇关于编译失败,在Mac OS X Lion中的OpenMP(memcpy和上证所内部函数)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆