如何延迟了ARM Cortex M0 + n个循环,没有一个计时器? [英] How to delay an ARM Cortex M0+ for n cycles, without a timer?

查看:631
本文介绍了如何延迟了ARM Cortex M0 + n个循环,没有一个计时器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要延迟了ARM Cortex M0 + n个循环,而无需使用一个计时器,用尽可能小的code尺寸。 (我觉得这个强制要求使用汇编的。)

I want to delay an ARM Cortex M0+ for n cycles, without using a timer, with the smallest possible code size. (I think this mandates use of assembly.)

0周期延迟是简单的没有code。
1周期的延时是一个单一的NOP。
2周期的延迟是两条NOP指令。

A delay of 0 cycles is simple no code. A delay of 1 cycle is a single NOP. A delay of 2 cycles is two NOPs.

在什么时候是(code尺寸)高效循环开始?

At what point is it (code-size) efficient to start looping?

多少个时钟周期最紧密的循环走?什么是建立时间?

How many cycles does the tightest possible loop take? What is the setup time?

邮政回答Notes:

下面的C code:

register unsigned char counter = 100;
while (counter-- > 0) {
  asm("");
}

在使用GCC编译和-O3给出了:

when compiled with gcc and -O3 gives:

    mov r3, #100
.L5:
    sub r3, r3, #1
    uxtb    r3, r3
    cmp r3, #0
    bne .L5

这要么说明仍然没有在手工编码ARM汇编的目的,或(更可能),上面的C code不是传达什么,我想要做的编译器的最佳方式。

This either illustrates that there is still purpose in hand-coding ARM assembly, or (much more likely) that the C code above is not the best way of convey to the compiler what I want to do.

评论?

推荐答案

在code是要依靠什么的 N 的是,是否需要动态可变的,但考虑到的M0 +内核的指令时序中,建立边界一个特定的程序是pretty简单。

The code is going to depend on exactly what n is, and whether it needs to be dynamically variable, but given the M0+ core's instruction timings, establishing bounds for a particular routine is pretty straightforward.

有关最小的可能(6字节)的完整循环具有固定的8位立即计数器:

For the smallest possible (6-byte) complete loop with a fixed 8-bit immediate counter:

   movs  r0, #NUM    ;1 cycle
1: subs  r0, r0, #1  ;1 cycle
   bne   1b          ;2 if taken, 1 otherwise

NUM = 1 我们得到了最少3个周期,再加上每多循环3个周期长达 NUM = 255 765次(当然,你可以有从 NUM = 0 2 ^ 32次,但似乎有点傻)。这让下界为一个循环约6个周期务实。具有固定循环可以很容易地垫的NOP(或者甚至嵌套循环)内它延长每次迭代,和前/后对准环路长度的一个非整数倍。如果你能安排一个迭代次数是在一个寄存器准备好你需要开始等候之前,那么你可能会失去初始 MOV 并有pretty多的3个或更多个周期,减一的倍数。如果你需要一个可变延迟单周期的分辨率,初始安装成本将是稍高,以纠正其余(一个计算分支成NOP雪橇就是我愿意为那些)

with NUM=1 we get a minimum of 3 cycles, plus 3 cycles for every extra loop up to NUM=255 at 765 cycles (of course, you could have 2^32 iterations from NUM=0, but that seems a bit silly). That puts the lower bound for a loop being practical at about 6 cycles. With a fixed loop it's easy to pad NOPs (or even nested loops) inside it to lengthen each iteration, and before/after to align to a non-multiple of the loop length. If you can arrange for a number of iterations to be ready in a register before you need to start waiting, then you can lose the initial mov and have pretty much any multiple of 3 or more cycles, minus one. If you need single-cycle resolution for a variable delay, the initial setup cost is going to be somewhat higher to correct for the remainder (a computed branch into a NOP sled is what I'd do for that)

我假设,如果你在周期关键时序的时候,你已经有过中断(在另一个周期,否则抛出某处 CPSID )和你没有任何巴士等待状态添加额外的周期来取指令。

I'm assuming that if you're at the point of cycle-critical timing you've already got interrupts off (otherwise throw in another cycle somewhere for CPSID), and that you don't have any bus wait states adding extra cycles to instruction fetches.

至于试图做到这一点在C:事实上,你在破解一个空的 ASM 来保持无用的循环被优化掉的是一个提示-off。抽象ç机器没有的指令或循环概念,所以根本就没有办法可靠EX preSS本的语言。试图依靠特定的C构造编写适合的指示是非常脆弱的 - 改变一个编译器标志;升级编译器;改变一些遥远code这将影响寄存器分配,影响指令选择;等等 - pretty任何东西可以改变产生code意外,所以我会说手工codeD大会是的只有的为周期精确$ C $明智的做法角

As for trying to do it in C: the fact that you have to hack in an empty asm to keep the "useless" loop from being optimised away is a tip-off. The abstract C machine has no notion of "instructions" or "cycles" so there is simply no way to reliably express this in the language. Trying to rely on particular C constructs to compile to suitable instructions is extremely fragile - change a compiler flag; upgrade the compiler; change some distant code which affects register allocation which affects instruction selection; etc. - pretty much anything could change the generated code unexpectedly, so I'd say hand-coded assembly is the only sensible approach for cycle-accurate code.

这篇关于如何延迟了ARM Cortex M0 + n个循环,没有一个计时器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆