指定“-mfpu=neon-vfpv3"是否有优势?超过“-mfpu=neon"对于具有单独管道的 ARM? [英] Is there an advantage of specifying "-mfpu=neon-vfpv3" over "-mfpu=neon" for ARMs with separate pipelines?

查看:30
本文介绍了指定“-mfpu=neon-vfpv3"是否有优势?超过“-mfpu=neon"对于具有单独管道的 ARM?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的 Zynq-7000 ARM Cortex-A9 处理器同时具有 NEON 和 VFPv3 扩展,Zynq-7000-TRM 表示处理器配置为具有VFPv3 和高级 SIMD 指令的独立管道".

My Zynq-7000 ARM Cortex-A9 Processor has both the NEON and the VFPv3 extension and the Zynq-7000-TRM says that the processor is configured to have "Independent pipelines for VFPv3 and advanced SIMD instructions".

到目前为止,我使用 Linaro GCC 6.3-2017.05 和 -mfpu=neon 选项编译了我的程序,以利用 SIMD 指令.但是在编译器也有非SIMD操作要下发的情况下,使用-mfpu=neon-vfpv3会不会有区别?GCC 的指令选择和调度器是否会为两个版本发出指令,以便它可以同时使用两个管道,以提高 CPU 的利用率?

So far I compiled my programs with Linaro GCC 6.3-2017.05 and the -mfpu=neon option, to make use of SIMD instructions. But in the case that the compiler also has non-SIMD operations to be issued, will it make a difference to use -mfpu=neon-vfpv3? Will GCC's instruction selection and scheduler emit instructions for both versions, so that it could then make use of both pipelines, to increase utilization of the CPU?

推荐答案

技术上,是的.

现实,没有.

NEON 在 ARMv7 上是可选的.

NEON has been optional on ARMv7.

被许可方可以从以下配置中选择一种:

The licensees can choose one configuration from below:

  • 仅限 VFP
  • NEON 加上 VFP

与 NEON 不同,ARMv7 上有不同的 VFP 版本,Cortex-A8 上的 VFP-lite 是最臭名昭著的版本,因为它没有流水线化,因此速度非常慢.

Unlike NEON, there has been different VFP versions on ARMv7, the VFP-lite on Cortex-A8 being the most notorious one for not pipelining, thus extremely slow.

因此,通过编译器选项指定 CPU 配置和架构版本在技术上是有意义的,以便编译器可以为该特定架构/配置生成最优化的机器代码.

Therefore, it technically makes sense to specify the CPU configuration and the architecture version via compiler options so that the compilers can generate the most optimized machine codes for that particular architecture/configuration.

然而,实际上,如今的编译器忽略了这些构建选项中的大部分,甚至还忽略了指令.

In reality however, the compilers these days ignore most of these build options and even directives in addition.

并且将 VFP 和 NEON 指令分配给不同的流水线不会有太大帮助,如果有帮助的话,因为它们都共享寄存器组.

And that the VFP and NEON instructions are assigned to different pipelines won't help much, if at all since they both share the register bank.

通过使用尽可能多的寄存器来提升 NEON 的性能将带来的不仅仅是让 VFP 并行运行.

Boosting NEON's performance by utilizing as many registers as possible would bring much more than let the VFP run in parallel instead.

这让我不解,为什么以及如何这么多人如此信任免费编译器.

It riddles me why and how so many people put so much trust in free compilers these days.

可用的最佳 ARM 编译器是价值 6000 美元以上的 DS-5 Ultimate Edition 附带的 ARM 编译器.他们的支持非常好,但我不确定这是否能证明价格合理.

The best ARM compiler available is hands down ARM's that comes with the $6k+ DS-5 Ultimate Edition. Their support is excellent, but I'm not sure if it justifies the price tag.

这篇关于指定“-mfpu=neon-vfpv3"是否有优势?超过“-mfpu=neon"对于具有单独管道的 ARM?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆