如何在没有计时器的情况下将 ARM Cortex M0+ 延迟 n 个周期? [英] How to delay an ARM Cortex M0+ for n cycles, without a timer?

查看:22
本文介绍了如何在没有计时器的情况下将 ARM Cortex M0+ 延迟 n 个周期?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在不使用定时器的情况下将 ARM Cortex M0+ 延迟 n 个周期,并使用尽可能小的代码大小.(我认为这要求使用汇编.)

I want to delay an ARM Cortex M0+ for n cycles, without using a timer, with the smallest possible code size. (I think this mandates use of assembly.)

0 个周期的延迟很简单,没有代码.1 个周期的延迟是一个 NOP.2 个周期的延迟是两个 NOP.

A delay of 0 cycles is simple no code. A delay of 1 cycle is a single NOP. A delay of 2 cycles is two NOPs.

在什么时候开始循环(代码大小)是有效的?

At what point is it (code-size) efficient to start looping?

最紧凑的循环需要多少个循环?什么是设置时间?

How many cycles does the tightest possible loop take? What is the setup time?

发表回答笔记:

以下C代码:

register unsigned char counter = 100;
while (counter-- > 0) {
  asm("");
}

当用 gcc 和 -O3 编译时给出:

when compiled with gcc and -O3 gives:

    mov r3, #100
.L5:
    sub r3, r3, #1
    uxtb    r3, r3
    cmp r3, #0
    bne .L5

这要么说明手工编写 ARM 汇编代码仍有目的,要么(更有可能)上面的 C 代码不是向编译器传达我想要做的事情的最佳方式.

This either illustrates that there is still purpose in hand-coding ARM assembly, or (much more likely) that the C code above is not the best way of convey to the compiler what I want to do.

评论?

推荐答案

代码将取决于 n 到底是什么,以及它是否需要动态可变,但给定 M0+ 内核的指令时序,为特定例程建立界限非常简单.

The code is going to depend on exactly what n is, and whether it needs to be dynamically variable, but given the M0+ core's instruction timings, establishing bounds for a particular routine is pretty straightforward.

对于具有固定 8 位立即数计数器的最小可能(6 字节)完整循环:

For the smallest possible (6-byte) complete loop with a fixed 8-bit immediate counter:

   movs  r0, #NUM    ;1 cycle
1: subs  r0, r0, #1  ;1 cycle
   bne   1b          ;2 if taken, 1 otherwise

使用 NUM=1 我们得到最少 3 个周期,加上每个额外循环 3 个周期,直到 NUM=255 在 765 个周期(当然,你可以从 NUM=0 开始有 2^32 次迭代,但这似乎有点傻).这使得循环的下限在大约 6 个周期时是实用的.使用固定循环很容易在其中填充 NOP(甚至嵌套循环)以延长每次迭代,并在之前/之后与循环长度的非倍数对齐.如果您可以在需要开始等待之前安排在寄存器中准备好多次迭代,那么您可能会丢失初始 mov 并且几乎是 3 个或更多周期的倍数,减去一个.如果您需要可变延迟的单周期分辨率,则初始设置成本会稍微高一些以校正剩余部分(我会为此做一个计算分支到 NOP 雪橇)

with NUM=1 we get a minimum of 3 cycles, plus 3 cycles for every extra loop up to NUM=255 at 765 cycles (of course, you could have 2^32 iterations from NUM=0, but that seems a bit silly). That puts the lower bound for a loop being practical at about 6 cycles. With a fixed loop it's easy to pad NOPs (or even nested loops) inside it to lengthen each iteration, and before/after to align to a non-multiple of the loop length. If you can arrange for a number of iterations to be ready in a register before you need to start waiting, then you can lose the initial mov and have pretty much any multiple of 3 or more cycles, minus one. If you need single-cycle resolution for a variable delay, the initial setup cost is going to be somewhat higher to correct for the remainder (a computed branch into a NOP sled is what I'd do for that)

我假设如果您处于周期关键时间点,您已经关闭了中断(否则为 CPSID 在某处投入另一个周期),并且您没有'没有任何总线等待状态为指令获取增加额外的周期.

I'm assuming that if you're at the point of cycle-critical timing you've already got interrupts off (otherwise throw in another cycle somewhere for CPSID), and that you don't have any bus wait states adding extra cycles to instruction fetches.

至于尝试在 C 中执行此操作:您必须在一个空的 asm 中进行 hack 以防止无用"循环被优化这一事实是一个提示.抽象的 C 机器没有指令"或循环"的概念,因此根本无法在语言中可靠地表达这一点.试图依赖特定的 C 结构来编译为合适的指令是非常脆弱的 - 更改编译器标志;升级编译器;更改一些影响寄存器分配的远程代码,从而影响指令选择;等等 - 几乎任何事情都可能意外地改变生成的代码,所以我想说手工编码的汇编是唯一循环精确代码的明智方法.

As for trying to do it in C: the fact that you have to hack in an empty asm to keep the "useless" loop from being optimised away is a tip-off. The abstract C machine has no notion of "instructions" or "cycles" so there is simply no way to reliably express this in the language. Trying to rely on particular C constructs to compile to suitable instructions is extremely fragile - change a compiler flag; upgrade the compiler; change some distant code which affects register allocation which affects instruction selection; etc. - pretty much anything could change the generated code unexpectedly, so I'd say hand-coded assembly is the only sensible approach for cycle-accurate code.

这篇关于如何在没有计时器的情况下将 ARM Cortex M0+ 延迟 n 个周期?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆