用8085微处理器汇编语言查找数字的绝对值 [英] Finding the absolute value of a number in 8085 microprocessor assembly language

查看:167
本文介绍了用8085微处理器汇编语言查找数字的绝对值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的任务是用8085汇编语言查找任何给定数字的绝对值.

I have a task of finding the absolute value of any given number in 8085 assembly language.

以下是算法(可在Internet上找到):

The algorithm is the following (found on the internet):

mask = n >> 7(数字本身是8位)

mask = n >> 7 (number itself is 8 bits)

(mask + n)XOR掩码

(mask + n) XOR mask

我的问题是我将如何用汇编语言实现这一点. 看来我应该使用"RRC"命令,但是对数字执行循环移位,算法似乎不起作用.

My question is that how would I implement this in assembly language. It seems that I should be using the "RRC" command but that performs circular shift on the number and the algorithm doesnt seem to work.

任何想法都将不胜感激. 干杯.

Any ideas would be appreciated. Cheers.

推荐答案

abs算法中的n>>7算术右移,它会移位符号位的副本,因此您对于负数n,获取-1;对于非负数,获取0. (以2的补码形式,-1的位模式已设置所有位).

The n>>7 in that abs algorithm is an arithmetic right shift that shifts in copies of the sign bit, so you get -1 for negative n, 0 for non-negative. (In 2's complement, the bit pattern for -1 has all bits set).

然后,您可以使用它不执行任何操作(n+0) ^ 0或像-n = (n + (-1)) ^ -1 = ~(n-1)那样手动"执行2的补码求反.

Then you use this to do either nothing (n+0) ^ 0 or to do 2's complement negation "manually" as -n = (n + (-1)) ^ -1 = ~(n-1).

请参见

See How to prove that the C statement -x, ~x+1, and ~(x-1) yield the same results? for 2's complement identities. XOR with all-ones is bitwise NOT. Adding mask = -1 is of course n-1

分支很便宜,并且创建和使用0-1(根据数字的符号)涉及的寄存器复制加起来. (尽管我确实提出了一种仅用6个字节的代码来实现这一点的方法,并且代码大小与分支版本相同.)

Branches are cheap, and the register copying involved in creating and using a 0 or -1 (according to the sign of a number) adds up. (Although I did come up with a way to implement this in only 6 bytes of code, same code size as the branchy version.)

在8085上,只需使用简单的方法即可实现:if(n<0) n=-n;

On 8085, just implement it the simple way: if(n<0) n=-n;

(将结果视为无符号;请注意8位的-0x80 = 0x80.如果假设abs之后它是带正符号的,对于最负的输入,您将是错误的.)

(Treat the result as unsigned; note that -0x80 = 0x80 in 8-bit. If you assume it's signed-positive after abs, you'll be wrong for the most-negative input.)

在条件否定条件下,条件分支应该是微不足道的; 8085确实具有取决于符号位的分支. (但是,除非您使用未公开的k标志=带符号的溢出,否则通常不带符号比较). 先根据A设置标志,然后根据否定JP设置标志. (加"条件测试Sign标志= 0,因此它实际上是在测试非负数而不是严格的正数)

This should be trivial with a conditional branch conditional branch over a negation; 8085 does have branches that depend on the sign bit. (Not signed-compare in general though, unless you use the undocumented k flag = signed overflow). Set flags according to A, then JP over a negation. (The "Plus" condition tests that Sign flag = 0, so it's actually testing for non-negative instead of strictly positive)

neg指令" rel ="nofollow noreferrer"> https://www.daenotes.com/electronics/digital-electronics/instruction-set-intel-8085 ,因此您可以将另一个寄存器和sub置零,也可以用2取反累加器.补充CMA之类的身份(没有A ) ; inr a(累加器+ = 1)而不是移动到另一个reg并减去A = 0.

I don't see a neg instruction in https://www.daenotes.com/electronics/digital-electronics/instruction-set-intel-8085 so you you could zero another register and sub, or you could negate the accumulator in place with a 2's complement identity like CMA (NOT A) ; inr a (accumulator += 1) instead of mov to another reg and subtracting from A=0.

8085具有便宜的分支,不像现代的流水线CPU,在分支流失预测上分支可能会很昂贵. mask = n >> 31或等效的无分支abs在这里很有用,整个过程通常只有3或4条指令. (8085仅具有1移位指令;后来的ISA(包括现代x86)具有快速的立即移位功能,可以在单个指令中执行n >> 31,通常具有良好的延迟(如1个周期).)

8085 has cheap branching, not like a modern pipelined CPU where branching can be expensive on branch mis-predictions. The mask = n >> 31 or equivalent for branchless abs is useful there and the whole thing is typically only 3 or 4 instructions. (8085 only has shift-by-1 instructions; later ISAs including modern x86 have fast immediate shifts that can do n >> 31 in a single instruction, usually with good latency like 1 cycle.)

; total 6 bytes.  (jumps are opcode + 16-bit absolute target address)
    ana  A              ; set flags from A&A
    jp  non_negative    ; jump if MSB was clear
    cma
    inr  A              ; A = ~A+1 = -A
 non_negative:
   ; unsigned A = abs(signed A) at this point

http://pastraiser.com/cpu/i8085/i8085_opcodes.html 有一个带有周期计时的操作码图. 1字节ALU寄存器指令占用4个周期,2字节ALU reg指令(带立即数)占用7.条件分支占用7个未占用的周期,10个周期.

http://pastraiser.com/cpu/i8085/i8085_opcodes.html has an opcode map with cycle timings. 1-byte ALU register instructions take 4 cycles, 2-byte ALU reg instructions (with an immediate) take 7. Conditional branches take 7 cycles not-taken, 10 cycles taken.

  • 对于非负输入(取):周期成本为4(ANA)+ 10(JP)= 14个周期
  • 对于负输入(未使用):4(ANA)+ 7(JP)+ 4 + 4 = 19个周期.

(时序计算似乎很简单;每条指令只有一个固定的成本,这与现代流水线超标量无序CPU不同,后者的吞吐量和延迟是分开的事情,并不是每条指令都可以在每个执行端口上运行... )

(Timing calculations seem to be trivial; each instruction just has a single fixed cost, unlike modern pipelined superscalar out-of-order CPUs where throughput and latency are separate things and not every instruction can run on every execution port...)

这是一个比较知名的组装技巧,用于将比较条件转换为0/-1掩码.您只需要将值的MSB放入进位标志即可,例如使用A + A或旋转.这样便得到了xor/add所需的n >> 7 0:-1值.

This is a somewhat well-known assembly trick for turning a compare condition into a 0 / -1 mask. You just need to get the MSB of your value into the carry flag, e.g. with A+A or a rotate. That gives you the n >> 7 0 : -1 value you need for xor/add.

只是为了好玩,我尝试使用此技巧无分支地实现abs().这是我想出的最好的. 仅在需要抵抗定时攻击的情况下才使用此选项,因此时钟周期成本不取决于输入数据.(或者用于与位置无关的代码;跳转使用绝对目标地址,而不是+-相对地址偏移量.)

Just for fun, I tried implementing abs() branchlessly with this trick. This is the best I've come up with. Only use this if you need immunity to timing attacks, so clock cycle cost doesn't depend on input data. (Or for position-independent code; jumps use an absolute target address, not a +- relative offset.)

它具有将原件保留在另一个寄存器中的优点.

;;;   UNTESTED slower branchless abs
;; a = abs(b).  destroys c (or pick any other tmp reg)
;; these are all 1-byte instructions (4 cycles each)
   mov  a, b
   add  a         ; CF = sign bit
   sbb  a         ; A = n-n-CF = -CF.  0 or -1
   mov  c, a
   xra  b         ;  n         or    ~n
   sub  a, c      ; n-0 = n    or    ~n-(-1) = ~n+1 = -n

; uint8_t A = abs(int8_t B)

这仍然只有6个字节,与branchy相同,但是它花费6 * 4 = 24个周期.

This is still only 6 bytes, same as branchy, but it costs 6*4 = 24 cycles.

如果XRA不影响标志,则可以在-1步骤中执行sbi 0.但是它总是清除CF.我没有办法保存0/-1结果的副本.而且我们无法计算到B中就地执行; 8085是一台蓄能器. 8086的1字节累加交换在哪里? xchg a,b会很有用.

If XRA didn't affect flags we could sbi 0 for the -1 step. But it does always clear CF. I don't see a way around saving a copy of the 0 / -1 result. And we can't compute into B to do it in-place; 8085 is an accumulator machine. Where's 8086's 1-byte exchange-with-accumulator when you need it? xchg a,b would have been useful.

如果您的值以A开头,则需要将其复制到其他位置,因此需要销毁另外两个 寄存器.

If your value starts in A, you need to copy it somewhere else, so you need to destroy two other registers.

将A的符号位广播到所有位置的更糟糕的选择:

A worse alternative for broadcasting the sign bit of A to all positions:

   RLC     ; low bit of accumulator = previous sign bit
   CMA     ; Bitwise NOT: 0 for negative, 1 for non-negative
   ANI  1  ; isolate it, clearing higher bits
   DCR  A  ; 0 or 1  -> -1 or 0

这甚至比rlc/sbb a更糟;我仅将其作为位操作的练习,以了解其工作原理. (而且因为在记起我从其他ISA那里了解到的SBB技巧也可以在这里工作之前,我已经输入了它).

This is even worse than rlc / sbb a; I include it only as an exercise in bit-manipulation to see why it works. (And because I'd already typed it up before remembering that the SBB trick I know from other ISAs will work here, too.)

这篇关于用8085微处理器汇编语言查找数字的绝对值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆