优化neon代码的一些疑惑 [英] Some doubts in optimizing the neon code

查看:36
本文介绍了优化neon代码的一些疑惑的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在汇编中编写了一些霓虹灯代码,旨在最大限度地优化.尽管数字看起来令人满意,但我对了解进一步优化它的可能性很感兴趣.然后我遇到了一个在线工具,它有助于计算每条指令的周期.

I wrote some neon code in assembly and was aiming for maximum optimization. Though the numbers seem satisfactory, I was interested in understanding the possibilities of optimizing it further. Then I came across an online tool which helps in counting the cycles of each instruction.

这是我的代码的链接:http://pulsar.webshaker.net/ccc/样本-115d4c29

它清楚地标记了我关注的领域,但我无法清楚地理解这些陈述包含开销的原因.

It clearly marked the areas of my concern, but I could not clearly understand the reason for those statements to contain the overheads.

代码段在评论"区分为7个部分,以便于参考.

The code segment is divided into 7 sections in the 'comment' area to make it easier for referring.

提前致谢.:)

推荐答案

你可以试试这个链接

http://pulsar.webshaker.net/ccc/beta-sample-115d4c29

这里使用循环计数器的测试版 0.9.主要区别在于 NEON 模拟器不再使用 2 个不同的管道.由于 Cortex A9 不能在一个周期内执行 2 条 NEON 指令.

this use the beta version 0.9 of the cycle counter. The main difference is that NEON simulator do not use 2 distincts pipelines anymore. Due to Cortex A9 that can't execute 2 NEON instructions in one cycle.

我开始更新循环计数器的某些部分.

I Started to udpate some part of the cycle counter.

结果是:

-Cortex A9 的周期信息更准确.

-The cycle information are more accurate for Cortex A9.

-结果更容易阅读,因为大多数 NEON 延迟信息是由于未配对的指令造成的.

-The result is easier to read because most of NEON latency information are due to unpaired instructions.

橙色表示等待管道导致的延迟

Orange color mean latency due to waiting for pipeline

红色表示由于寄存器冲突导致的延迟.

Red color mean latency due to register conflict.

寄存器附近指定的数字不是松散周期数.这是您可以在此指令之前放置的最大指令数.

The number spécified near the register is not the number of loosed cycles. This is the max number of instructions you could place before this instruction.

希望能帮到你!

这篇关于优化neon代码的一些疑惑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆