在macOS上可以启用的最低支持的SSE标志是什么? [英] What is the minimum supported SSE flag that can be enabled on macOS?

查看:455
本文介绍了在macOS上可以启用的最低支持的SSE标志是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这些天,我使用的大多数硬件都支持SSE2.在Windows和Linux上,我有一些代码可以测试SSE支持.我在某处读到macOS长期以来一直支持SSE,但我不知道可以启用的最低版本.最终的二进制文件将被复制到其他macOS平台,因此我不能像GCC一样使用-march=native.

Most of the hardware I uses supports SSE2 these days. On Windows and Linux, I have some code to test SSE support. I read somewhere that macOS has supported SSE for a long time, but I don't know the minimum version that can be enabled. The final binary will be copied to other macOS platforms so I cannot use -march=native like with GCC.

如果默认情况下在所有版本上都启用了此功能,那么在构建代码时是否必须传递-msse-msse2标志?

If it is enabled by default on all builds, do I have to pass -msse or -msse2 flags when building my code ?

这是我的编译器版本:

Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)
Target: x86_64-apple-darwin14.1.0
Thread model: posix

这是uname -a的输出

Here is the output of uname -a

uname -a
Darwin mme.local 14.1.0 Darwin Kernel Version 14.1.0: Mon Dec 22 23:10:38 PST 2014; root:xnu-2782.10.72~2/RELEASE_X86_64 x86_64

这是sysctl machdep.cpu.features的输出

Here is the output of sysctl machdep.cpu.features

machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 DTES64 MON DSCPL VMX EST TM2 SSSE3 CX16 TPR PDCM SSE4.1 SSE4.2 POPCNT

推荐答案

默认情况下,x86-64启用了SSE2,因为它是x86-64 ISA的必需部分.

SSE2 is enabled by default for x86-64, because it's a required part of the x86-64 ISA.

由于Apple从未出售过任何AMD或Pentium4 CPU,因此OS X上的x86-64也意味着SSSE3(第一代Core2).首批x86 Mac是Core(不是Core2),但它们只有32位.不幸的是,您不能使用SSE4.1或-mpopcnt.

Since Apple has never sold any AMD or Pentium4 CPUs, x86-64 on OS X also implies SSSE3 (first-gen Core2). The first x86 Macs were Core (not Core2), but they were 32-bit only. You unfortunately can't assume SSE4.1 or -mpopcnt.

我建议-march=core2 -mtune=haswell . (-mtune不会影响兼容性,并且Haswell调整对于实际的Core2或Nehalem硬件也应该不错.请参见 http ://agner.org/optimize/标签Wiki,以获取有关(在编译器生成的情况下)汇编语言在不同CPU上快或慢的微体系结构详细信息.

I'd suggest -march=core2 -mtune=haswell. (-mtune doesn't affect compatibility, and Haswell tuning shouldn't be bad for actual Core2 or Nehalem hardware. See http://agner.org/optimize/ and links in the x86 tag wiki for microarchitecture details about what things in (compiler-generated) assembly language are fast or slow on different CPUs.).

(请参阅 mtune实际如何工作?,请参见不同调整引起的示例选择不同的指令,而无需更改所需的ISA扩展.)

(See How does mtune actually work? for an example of different tuning causing different instruction selection without changing the required ISA extensions.)

-march=core2启用core2支持的所有功能,而不仅仅是SSSE3.由于您不太关心代码在AMD CPU上的性能(因为它是OS X),因此可以调整为Intel CPU.还有-mtune=intel更为通用,但是Haswell应该是合理的.

-march=core2 enables everything that core2 supports, not just SSSE3. Since you don't care about your code performing well on AMD CPUs (because it's OS X), you can tune for an Intel CPU. There's also -mtune=intel which is more generic, but Haswell should be reasonable.

您可能会缺少对Hackintosh系统的支持,因为有人在非Apple硬件上的古老CPU上安装了OS X,但如果OS X可以在AMD Athlon64/PhenomII或Intel P4上运行,则可以IDK.

You might be missing out on support for Hackintosh systems where someone installed OS X on an ancient CPU on non-Apple hardware, but IDK if OS X would work on an AMD Athlon64 / PhenomII, or Intel P4.

能够启用诸如-mpopcnt之类的Nehalem东西会很好,但是Core 2第一代和第二代(Conroe和Penryn)却没有.第一代Core 2甚至不提供SSE4.1.

It would be nice to be able to enable some Nehalem stuff like -mpopcnt, but Core 2 first and 2nd gen (Conroe and Penryn) lacked that. Even SSE4.1 isn't available on first-gen Core 2.

还可以用基线和Haswell切片构建胖二进制文件x86_64x86_64h. Stephen Cannon说(在下面的评论中)"x86_64h slice将在Haswell和更高版本的微arches上自动运行". (针对其他uarches的切片目前尚不可用,但是大多数程序受益不大.)

It's also possible to build a fat binary with baseline and Haswell slices, x86_64 and x86_64h. Stephen Cannon says (in comments below) that "the x86_64h slice will run automatically on Haswell and later µarches". (Slices for other uarches aren't currently an option, but most programs would get little benefit.)

您的x86_64(非Haswell)切片可能应该使用-march=core2 -mtune=sandybridge构建.

Your x86_64 (non-Haswell) slice should probably build with -march=core2 -mtune=sandybridge.

Haswell推出了AVX2,FMA和BMI2 ,因此-march=haswell非常适合Broadwell/Skylake/Kaby Lake/Coffee Lake. (对于调整选项以及ISA扩展:gcc -march=haswell禁用-mavx256-split-unaligned-load和存储,而-mavx + tune = default或sandybridge启用它.它

Haswell introduced AVX2, FMA, and BMI2, so -march=haswell is a very nice for Broadwell / Skylake / Kaby Lake / Coffee Lake. (For tuning options as well as ISA extensions: gcc -march=haswell disables -mavx256-split-unaligned-load and store, while -mavx + tune=default or sandybridge enables it. It sucks on Haswell especially when it creates shuffle-port bottlenecks. And it's really dumb when your data is almost always aligned, or really always but you just didn't tell the compiler about it.

Broadwell推出了相当小众的ADOX/ADCX(并行运行两个扩展精度添加依赖项链),而Skylake推出了clflushopt,它没有广泛用途.

Broadwell introduced ADOX/ADCX which is pretty niche (run two extended-precision add dependency chains in parallel), and Skylake introduced clflushopt which isn't widely useful.

Skylake和大多数Broadwell CPU确实具有有效的事务内存,这对于某些细粒度的多线程情况可能很重要. (Haswell将会拥有它,但是在实现中发现了一个罕见的错误之后,它在微代码更新中被禁用了.)

Skylake and most Broadwell CPUs do have working transactional memory, though, which might be important for some fine-grained multithreading cases. (Haswell was going to have it, but it was disabled in a microcode update after a rare bug was discovered in the implementation.)

AVX512是下一个大有用处,但Haswell却没有,,因此苹果可能会在某个时候增加对Cannonlake或Ice Lake slice的支持.

AVX512 is the next big thing that's widely useful but Haswell doesn't have, so maybe Apple will add support for a Cannonlake or Ice Lake slice at some point.

我不建议为Broadwell或Skylake(具有任何调度机制)构建单独的版本,除非您知道可以利用特定的新功能并且它会带来很大的不同.

I wouldn't recommend making a separate build for Broadwell or Skylake (with any dispatching mechanism), unless you know you can take advantage of a specific new feature and it makes a significant difference.

但是对于Sandybridge,不具有AVX2的AVX支持,尤其是对于256位FP数学以及将movdqa指令保存为128位整数矢量代码而言,这可能很有用.也适用于SSE4.x和popcnt.在使用dec/jnz的扩展精度adc循环中,没有局部标志问题.

But it could be potentially useful for Sandybridge, for AVX support without AVX2, especially for 256-bit FP math but also to save movdqa instructions in integer 128-bit vector code. Also for SSE4.x and popcnt. And no partial-flag problems in an extended-precision adc loop using dec/jnz.

这篇关于在macOS上可以启用的最低支持的SSE标志是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆