使用 ARM 循环计数后处理“objdump --disassemble" [英] Post process `objdump --disassemble` with ARM cycle counts

查看：14 发布时间：2022/1/17 13:41:37 gcc open-source arm objdump

本文介绍了使用 ARM 循环计数后处理“objdump --disassemble"的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

是否有可用于对某些 objdump --disassemble 输出进行后处理以使用循环计数进行注释的脚本?特别是对于 ARM 家族.~~大多数情况下，这只是与计数的表查找的模式匹配.我猜可能需要像 +5M 这样的注解五个内存周期.~~ Perl、python、bash、C 等都可以.我认为这可以通用，但我对 ARM 感兴趣，它有一个正交指令集.这是关于 68HC11 做同样事情的线程.该脚本需要一个 CPU model 选项来选择适当的循环计数；我认为这些计数已经存在于 gcc 机器描述中.

Is there a script available for post processing some objdump --disassemble output to annotate with cycle counts? Especially for the ARM family. ~~Most of the time this would only be a pattern match with a table lookup for the count. I guess annotations like +5M for five memory cycles might be needed.~~ Perl, python, bash, C, etc are fine. I think this can be done generically, but I am interested in the ARM, which has an orthogonal instruction set. Here is a thread on the 68HC11 doing the same thing. The script would need an CPU model option to select the appropriate cycle counts; I think these counts already exist in the gcc machine description.

我不认为有一个 objdump 开关，但 RTFM 会很棒.

I don't think there is an objdump switch for this, but RTFM would be great.

为了澄清，假设从缓存中执行代码时的最佳情况内存子系统是可以的.目标不是根据某些正在运行的机器进行 100% 准确的循环计数.有可能得到一个合理的估计，否则编译器设计是不可能的.

To clarify, assumptions such as best case memory sub-system as will be the case when the code executes from cache are fine. The goal is not a 100% accurate cycle count as per some running machine. It is possible to get a reasonable estimate, otherwise compiler design would be impossible.

正如 DWelch 指出的那样，使用深度流水线架构(例如最近的 Cortex 芯片)无法进行简单的运行总计.objdump 后处理必须查看周围的操作码.gcc 插件更有可能实现这一点，因为那是新的(4.5+)，我不认为这样的事情存在.ARM926 的脚本当然是可能的，而且相当简单.

As DWelch points out, a simple running total is not possible with deep pipelined architecture, like more recent Cortex chips. The objdump post processing would have to look at surrounding opcodes. A gcc plug-in is more likely to be able to accomplish this and as that is new (4.5+), I don't think such a thing exists. A script for the ARM926 is certainly possible and fairly simple.

内存延迟无关紧要.内存控制器就像另一个CPU.它在 CPU 做算术等时做它的业务.一个好的/调整好的算法将 parallel 通过计算访问内存.通过计算加载/存储和周期，您可以确定当您使用计时器主动分析时完成了多少并行度.由于寄存器之间的互锁，流水线很重要，但是基本块的循环计数可以可靠地即使在现代 ARM 处理器上也可以计算和使用；这对于一个简单的脚本来说太复杂了.

The memory latency doesn't matter. The memory controller is like another CPU. It is doing it's business while the CPU is doing arithmetic, etc. A good/well tuned algorithm will parallel the memory accesses with the computations. By counting loads/store and cycles you can determine how much parallelism is accomplished, when you actively profile with a timer. The pipeline is significant due to interlocks between registers, but a cycle count for basic blocks can reliably be calculated and used even on modern ARM processors; this is too complex for a simple script.

使用 ARM 循环计数后处理“objdump --disassemble" [英] Post process `objdump --disassemble` with ARM cycle counts

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用 ARM 循环计数后处理“objdump --disassemble" [英] Post process `objdump --disassemble` with ARM cycle counts

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭