Mips装配中的平方和 [英] Sum of Squares in Mips Assembly
问题描述
我必须计算Mips装配中数组的平方和.我一直在寻找我的代码的反馈.下面发布的代码只是一个开始,并未考虑可能的溢出情况.但是,我只想确保此基本代码块按预期工作
I have to calculate the sum of squares of an array in Mips assembly. I was looking for feedback for my code. The code posted below is just a start and doesn't take into account possible overflow situations. However, I just want to make sure, this basic block of code works as expected
# Function to calculate squared sum
#
# Inputs:
# $a0: address of array in memory (*a)
# $a1: array size (n)
#
# Outputs:
# $v0: low 32 bytes of result
# $v1: high 32 bytes of result
#
# If the array is empty, then the function returns zero.
#
squareSum:
Loop:
sltu $t5,$t4,$a1 # if(i<n)
beq $t5,$zero,Exit # if ($t5 == 0)
sll $t0,$t4,2 # προσωρινος καταχωρητης 4 * i
addu $t0,$t0,$a0 # ο καταχωρητης $t0 δειχνει στη διευθυνση μνημης του array προστιθεμενο με το 4 * i, αρα εχουμε παρει το array[i]
lw $t1,0($t0) # φορτωση στον καταχωρητη $t1 της τιμη του πινακα που θελουμε καθε στιγμη
multu $t1,$t1 # array[i] * array[i]
mflo $v0 # 32 least significant bits of multiplication to $v0
mfhi $v1 # 32 most significant bits of multiplication to $v1
addu $t2,$t2,$v0
addu $t3,$t3,$v1
addiu $v0,$zero,0
addiu $v1,$zero,0
addiu $t4,$t4,1 # i += 1
j Loop
Exit:
add $v0,$zero,$t2
add $v1,$zero,$t3
jr $ra
我不确定这是如何处理乘法的lo和hi,因此我想听听一些建议和提示
I am not sure this is how I have to handle the lo and hi of this multiplication so I want to hear some suggestions and tips
推荐答案
我不确定这是我该如何处理乘法的lo和hi
I am not sure this is how I have to handle the lo and hi of this multiplication
如果不确定,请准备简短的代码,以不同的输入值执行不确定的指令,然后使用调试器逐步解决它们,并消除所有混乱.
When you are not sure, prepare short piece of code exercising the instructions you are not completely sure, with different input values, and use debugger to step through them and clear up any confusion.
乘法后,您当前对lo/hi的用法对我来说是正确的,可以在MARS模拟器中正常工作.
Your current usage of lo/hi after multiplication looks correct to me, works as expected in MARS simulator.
经常使用调试器,对每条指令中添加的每小段新代码进行尝试,将使您的进步变得容易得多.在大量新代码中搜索某些错误甚至逻辑问题通常会带来更多问题,然后在编写每行新的10到10行代码后立即调试它们会有点乏味(您可以在需要停止的地方放置断点)在MARS模拟器中,SPIM工具系列具有相似的功能,对于其他MIPS平台,我不确定这些工具的外观如何;对于常规的MIPS linux + GNU工具链,您当然可以使用gdb
,但是学习起来并不那么简单火星,尽管它功能强大且完整).
Using debugger often, trying out every new small piece of code added per single instructions, will make your progress much easier. Searching for some bug or even logical problem in big chunk of new code is often more problematic, then a-bit-tedious nature of debugging every new 4-10 lines of code right after writing them (you can put breakpoints where you need to stop in MARS simulator, and SPIM family of tools has similar features, for other MIPS platforms I'm not sure how the tools look, for regular MIPS linux + GNU toolchain you have certainly gdb
available, but it's not as simple to learn as MARS, although it's much more powerful and complete).
从当前源使用无分支延迟时隙的分支判断,您可能正在使用MARS/SPIM模拟器,并且已将延迟分支"选项设置为OFF(在真正的MIPS CPU上,任何命令之后的第一条指令即使跳转是有条件地跳转,跳转仍会执行,因此在真正的MIPS上,您必须在每次跳转后添加nop
来抵消这种行为,或者为了以这种方式重组代码以获得最佳性能,以解决这一问题您将分支延迟指令槽用于实际有意义的指令.
Judging by your current source using branches in a way without branch-delay-slot, you are probably working with MARS/SPIM simulator, and you have "delayed branching" option OFF (on the real MIPS CPU the first instruction after any jump is still executed, even if the jump does branch conditionally, so on real MIPS you have to account for that either by adding nop
after each jump to neutralize this behaviour, or for best performance to reorganize your code in such way, that you use the branch-delay instruction slot for actual meaningful instruction.
我不喜欢您的代码的一件事是没有根据需要初始化局部变量……例如t4, t2, t3
.这将使您的函数最多只能使用一次,因为在第二次尝试期间,寄存器中已经有一些意外的值.也许您为简洁起见而忽略了这些内容,但是在我看来,这就像普通的代码错误一样,这些初始化程序甚至应该是简化的最小化示例代码的一部分,以表明您确实认为您的代码是完整的,并且了解其工作方式(并且它确实需要这些值).
One thing I don't like about your code is not initializing local variables as needed... for example t4, t2, t3
. That will make your function usable only once at most, as during second try there will be already some unexpected values in registers. Maybe you left those out for brevity of your question, but in my eyes that's like plain bug of the code, those initializers should be part even of simplified minimized example code, to show that you did think your code through and you understand how it operates (and that it really needs those values).
更多的提示使代码更优化"和更简单:为什么不将运行总和直接保存在v0,v1中,而将乘法结果存储到临时变量中呢?您可以在最后部分避免一举一动.
Some more hints to make the code a bit more "optimal" and simpler: why don't you keep the running sum directly in the v0, v1, and store multiplication result into temporaries instead? You can avoid one move of result in the final part.
您可以简化每次迭代的数组地址计算,可以使用地址+ = 4对其进行更新,而不必每次都执行完整操作(数组+ i * 4)(至少将i换为* 4,好).如果要在循环之前计算结束地址,则可以将整个循环条件构建为地址的bne
.
And you can simplify array address calculation every iteration, you can use address += 4 to update it, no need to do full (array + i*4) every time (at least you shifted that i for *4, good). If you would calculate end address ahead of loop, you can then build the whole loop condition as bne
of addresses.
您的注释中有很多错别字,例如"32字节"而不是"32位"等.而且我会使用更明确的标签,因为循环"可能会在更大的代码中与任何其他循环"发生冲突.
You have many typos in your comments, for example "32 bytes" instead of "32 bits", and similar. And I would use more explicit labels, because "loop" will probably clash with any other "loop" in somewhat larger code.
出于娱乐目的,我尝试按照自己的提示进行操作,然后将代码重新编写为我的口味",这就是结果(在MARS中尝试过,延迟分支"处于OFF状态,以检查生成的v0:v1
值是否放在断点后)每个jal
),还可以解决溢出情况:
For fun I tried to follow my hints myself, and rewrite the code more to "my taste", this is the result (tried in MARS, "delayed branching" OFF, to check resulting v0:v1
value put a breakpoint after each jal
), also fixing the overflow situation:
main: # test the subroutine
la $a0, testArr
li $a1, 4
jal squareSum
# v0:v1 = 14 (0*0 + 1*1 + 2*2 + 3*3)
# more complex input, testing 64 bit results and overflow
la $a0, testArr
li $a1, 7
jal squareSum
# terminate app
li $v0, 10
syscall
# Function to calculate squared sum
#
# Inputs:
# $a0: address of word array in memory (*a)
# $a1: array size (n)
#
# Outputs:
# $v0: low 32 bits of result
# $v1: high 32 bits of result
#
# If the array is empty, then the function returns zero.
#
squareSum:
# result = 0
addiu $v0, $zero,0
addiu $v1, $zero,0
# calculate end() pointer of array (for loop condition)
sll $a1, $a1, 2 # n * 4
addu $a1, $a1, $a0 # a1 = array.end() address (a0 = array.begin())
beq $a0, $a1, squareSum_exit # begin() == end() => empty array
squareSum_summing:
# load next array element and calculate it's square
lw $t0, 0($a0) # t0 = array[i]
addiu $a0, $a0, 4 # advance the array pointer
multu $t0, $t0 # array[i] * array[i]
mflo $t0 # t0 = 32 least significant bits of multiplication
mfhi $t1 # t1 = 32 most significant bits of multiplication
# add square value to the result
addu $v0, $v0, $t0
addu $v1, $v1, $t1
# handle unsigned addition overflow
sltu $t1, $v0, $t0 # t1 = 0/1 correction ((x+y) < y)
addu $v1, $v1, $t1 # add correction to the result
# loop while array_ptr != array.end()
bne $a0, $a1, squareSum_summing
squareSum_exit:
jr $ra
.data
testArr: .word 0, 1, 2, 3, 65535, 1024, 1048576
这篇关于Mips装配中的平方和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!