指定"-mfpu = neon-vfpv3"是否有优势?在"-mfpu =霓虹灯"上方具有单独管道的ARM? [英] Is there an advantage of specifying "-mfpu=neon-vfpv3" over "-mfpu=neon" for ARMs with separate pipelines?

查看:640
本文介绍了指定"-mfpu = neon-vfpv3"是否有优势?在"-mfpu =霓虹灯"上方具有单独管道的ARM?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的Zynq-7000 ARM Cortex-A9处理器同时具有NEON和VFPv3扩展,而Zynq-7000-TRM则将处理器配置为具有"VFPv3和高级SIMD指令的独立管道" em>.

到目前为止,我已经使用Linaro GCC 6.3-2017.05和-mfpu=neon选项编译了程序,以利用SIMD指令.但是,如果编译器还具有要发出的非SIMD操作,使用-mfpu=neon-vfpv3是否会有所不同? GCC的指令选择和调度程序是否会同时针对这两个版本发出指令,以便可以同时利用这两个流水线来提高CPU的利用率?

解决方案

从技术上讲,是的.

现实,不.

NEON在ARMv7上是可选的.

被许可人可以从下面选择一种配置:

  • 无人
  • 仅VFP
  • NEON plus VFP

与NEON不同,ARMv7上有不同的VFP版本,Cortex-A8上的VFP-lite是最臭名昭著的非流水线版本,因此速度非常慢.

因此,从技术上讲,通过编译器选项指定CPU配置和体系结构版本是有意义的,以便编译器可以针对该特定体系结构/配置生成最优化的机器代码.

但是,实际上,如今的编译器忽略了大多数这些构建选项,甚至忽略了指令.

将VFP和NEON指令分配给不同的管道并没有多大帮助,因为它们都共享寄存器组.

通过利用尽可能多的寄存器来提升NEON的性能,将带来远远超过让VFP并行运行的优势.

这些使我感到困惑的是,为什么以及如今有多少人如此信任免费编译器.

市面上最好的ARM编译器是传承$ 6k + DS-5 Ultimate Edition附带的ARM的.他们的支持非常出色,但我不确定这是否证明价格合理.

My Zynq-7000 ARM Cortex-A9 Processor has both the NEON and the VFPv3 extension and the Zynq-7000-TRM says that the processor is configured to have "Independent pipelines for VFPv3 and advanced SIMD instructions".

So far I compiled my programs with Linaro GCC 6.3-2017.05 and the -mfpu=neon option, to make use of SIMD instructions. But in the case that the compiler also has non-SIMD operations to be issued, will it make a difference to use -mfpu=neon-vfpv3? Will GCC's instruction selection and scheduler emit instructions for both versions, so that it could then make use of both pipelines, to increase utilization of the CPU?

解决方案

Technically, yes.

Reality, no.

NEON has been optional on ARMv7.

The licensees can choose one configuration from below:

  • none
  • VFP only
  • NEON plus VFP

Unlike NEON, there has been different VFP versions on ARMv7, the VFP-lite on Cortex-A8 being the most notorious one for not pipelining, thus extremely slow.

Therefore, it technically makes sense to specify the CPU configuration and the architecture version via compiler options so that the compilers can generate the most optimized machine codes for that particular architecture/configuration.

In reality however, the compilers these days ignore most of these build options and even directives in addition.

And that the VFP and NEON instructions are assigned to different pipelines won't help much, if at all since they both share the register bank.

Boosting NEON's performance by utilizing as many registers as possible would bring much more than let the VFP run in parallel instead.

It riddles me why and how so many people put so much trust in free compilers these days.

The best ARM compiler available is hands down ARM's that comes with the $6k+ DS-5 Ultimate Edition. Their support is excellent, but I'm not sure if it justifies the price tag.

这篇关于指定"-mfpu = neon-vfpv3"是否有优势?在"-mfpu =霓虹灯"上方具有单独管道的ARM?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆