tinyAVR:最有名的乘法例程8位和16位的因素是什么? [英] tinyAVR: best known multiplication routines for 8-bit and 16-bit factors?

查看:266
本文介绍了tinyAVR:最有名的乘法例程8位和16位的因素是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

更快比avr200b.asm?该 mpy8u -routine从 avr200b.asm 为Atmel的AVR系列的处理器没有实现任何的 MUL 的说明似乎pretty的通用的,但 mpy16u 看起来稀松旋转双下肢结果字节16倍而不是8 安东尼 presented一个的fast 使用64个周期最坏的情况下不包括调用/返回开销16×16→16位无符号的乘法运算。
我随便建议作为优化目标的最坏的情况下周期计数字计数的(RAM和闪存),寄存器使用预期的周期数的,为了降低优先级。
(还有的降低核心的AVR单片机(个位数-ATtiny,10/20/40)与差异,包括时间,我建议忽略。)

"Faster than avr200b.asm"? The mpy8u-routine from avr200b.asm for those processors of Atmel's AVR family that do not implement any of the MUL instructions seems pretty generic, but mpy16u looks sloppy for rotating both lower result bytes 16 times instead of 8. Antonio presented a fast 16×16→16 unsigned multiplication using 64 cycles worst case excluding call/return overhead.
I arbitrarily suggest as optimisation goals worst case cycle count, word count (RAM and flash), register usage, and expected cycle count in order of decreasing priority.
(There are reduced core AVRs ("single digit"-ATtiny, 10/20/40) with differences including timing, which I suggested to ignore.)

(注意:不要把任何索赔这里是理所当然的,至少在没有独立的肯定)

(Caution: Don't take any claim herein for granted, at least not without independent affirmation.)

什么目前最有名的是8×8→8/16,16×16→16/32和16×8→16/24位元的乘法例程的AVR无 MUL

What are best currently known 8×8→8/16, 16×16→16/32 and 16×8→16/24 bit multiplication routines for AVRs without MUL?

推荐答案

实现接壤的空间意识(供参考,如果不理智)。
使用时,资源可能应该是合格(G:胡乱猜测,G:猜对,E:猜测,E:预计,S:模拟,一:analised,答:analised&功放;证实,如果通过仿真,M:测量)(字×worstCaseCycleCount是一种成本的措施类似于面积×延迟IC设计(的好处单数字?))

Implementations bordering on space conscious (for reference, if not sanity).
Resources used, should probably be qualified (g: wild guess, G: guessed, e: educated guess, E: estimated, s: simulated, a: analised, A: analised & substantiated, if by simulation, m: measured) (words×worstCaseCycleCount is a cost measure akin to Area×Delay in IC design (single figure of "merit"?) )

algoritm            bits    cycles words   regs  remarks
                            wc exp   ×wccc excl.
                                           a,b,p
shift factor left  16×16→16(61 56  87 5307       see other
                            62 57  62 3844       answer)
                            73 68  37 2701
                            81 77  24 1944       (see edit history)
                            85 70g 15 1275       w*expcc~1050
                           108 64g 18 1944       w*expcc~1150
(jump table, for reference  51E49g 888e 44K G   (almost done)
                            44E39g2888E127K e)

(我查了相同的wordcycle项不止一次。)
宏,应该令人信服地分解出来

(I checked the identical "wordcycle entries" more than once.)
Macros, should conceivably be factored out

.MACRO doubleA  ;   adds (shifts/weights) factor "a"
    add     a0, a0  ; +1
    adc     a1, a1  ; +2
.EndM
.MACRO doHighB  ;   "does" bit in b1, bit number as a parameter
    sbrc    b1, @0  ; 1
    add     p1, a0  ; 2
.EndM
.MACRO condAdd
    doHighB @0      ; +2
    sbrs    b0, @1  ; +3
    rjmp    PC+3    ;+4/5
    addA            ; +6
.EndM
.MACRO step16; "do" 2 bits, bit# in b1 and b0 as a parameter
    condAdd @0, @0  ; +6
    doubleA         ; +8
.EndM

16×16→16位,八十一分之八十五周期,15/24的话:

16×16→16 bits, 85/81 cycles, 15/24 words:

mpy16x16:           ;       0
    clr     p0      ; 1
    clr     p1      ; 2
; wanting early out: shifting the factor; faster from Little End
    lsr     b0      ; 3
    brcc    shiftB1 ;4/5
addFull:
    addA            ; 2
shiftB1:            ;       due to handling this 2nd multiplier
    lsr     b1      ; 3     bit even if the multiplicand is zero
    brcc    pc+2    ;4/5    after the first shift, the earlyOutA
addHigh:            ;       variant is 3 cycles slower than 4.8
    add     p1, a0  ; 5     libgcc __mulhi3 - for * 0 or 0x8000
shiftA:
    doubleA         ; 7         why is adc zero-flag handling ...
#if 1||earlyOutA
    brne    shiftB0 ;+1/2   7   ... different from subc/sbci/cpc?
    tst     a0      ;+ 2
    breq    done    ;+ 3/-1upto-69?
#endif
shiftB0:
    lsr     b0      ; 8
    brcs    addFull ;9/10
    sbci    b1, 0   ; 10    presume zero or high reg?
    brne    shiftB1 ;11/12-2
done:               ; wc:   8*10+5=85   @15+1 words (?!)
    ret             ; best: 14 (0=b&0xfffe) (none for a)
                    ;(earlyOutA: wc: 8*13+4=108 @18+1 words)

16×16→16位,73次,37字:

16×16→16 bits, 73 cycles, 37 words:

mpy16x16:           ;       0
    clr     p0      ; 1
    clr     p1      ; 2
    rcall   nibble  ; 9     incl. ret (>16bit PC AVRs have mul(?))
    swap    b0      ; 10
    swap    b1      ; 11
    doubleA         ; 13
nibble:
    step16  0       ; +8
    step16  1       ; +16
    step16  2       ; +24
    doHighB 3       ; +26
    sbrs    b0, 3   ;27/28
    ret             ;       yikes
    addA            ; +30
    ret             ;       30 Hrrm *2+13 = 73 @ 4*8+5 = 37 words

这篇关于tinyAVR:最有名的乘法例程8位和16位的因素是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆