Mips装配中的平方和 [英] Sum of Squares in Mips Assembly

查看:117
本文介绍了Mips装配中的平方和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须计算Mips装配中数组的平方和.我一直在寻找我的代码的反馈.下面发布的代码只是一个开始,并未考虑可能的溢出情况.但是,我只想确保此基本代码块按预期工作

I have to calculate the sum of squares of an array in Mips assembly. I was looking for feedback for my code. The code posted below is just a start and doesn't take into account possible overflow situations. However, I just want to make sure, this basic block of code works as expected

# Function to calculate squared sum
# 
# Inputs:
#       $a0: address of array in memory (*a)
#       $a1: array size (n)
#
# Outputs:
#       $v0: low 32 bytes of result
#       $v1: high 32 bytes of result
#
# If the array is empty, then the function returns zero.
#

squareSum:



    Loop:

        sltu $t5,$t4,$a1    #   if(i<n)
        beq $t5,$zero,Exit  #   if ($t5 == 0)



        sll $t0,$t4,2   #   προσωρινος καταχωρητης 4 * i
        addu $t0,$t0,$a0    #   ο καταχωρητης $t0 δειχνει στη διευθυνση μνημης του array προστιθεμενο με το 4 *  i, αρα εχουμε παρει το array[i]
        lw $t1,0($t0)   #   φορτωση στον καταχωρητη $t1 της τιμη του πινακα που θελουμε καθε στιγμη
        multu $t1,$t1   #   array[i] * array[i]
        mflo $v0    #   32 least significant bits of multiplication to $v0
        mfhi $v1    #   32 most significant bits of multiplication to $v1

        addu $t2,$t2,$v0
        addu $t3,$t3,$v1

        addiu $v0,$zero,0
        addiu $v1,$zero,0

        addiu $t4,$t4,1 #   i += 1

        j Loop


        Exit:
            add $v0,$zero,$t2
            add $v1,$zero,$t3
            jr $ra

我不确定这是如何处理乘法的lo和hi,因此我想听听一些建议和提示

I am not sure this is how I have to handle the lo and hi of this multiplication so I want to hear some suggestions and tips

推荐答案

我不确定这是我该如何处理乘法的lo和hi

I am not sure this is how I have to handle the lo and hi of this multiplication

如果不确定,请准备简短的代码,以不同的输入值执行不确定的指令,然后使用调试器逐步解决它们,并消除所有混乱.

When you are not sure, prepare short piece of code exercising the instructions you are not completely sure, with different input values, and use debugger to step through them and clear up any confusion.

乘法后,您当前对lo/hi的用法对我来说是正确的,可以在MARS模拟器中正常工作.

Your current usage of lo/hi after multiplication looks correct to me, works as expected in MARS simulator.

经常使用调试器,对每条指令中添加的每小段新代码进行尝试,将使您的进步变得容易得多.在大量新代码中搜索某些错误甚至逻辑问题通常会带来更多问题,然后在编写每行新的10到10行代码后立即调试它们会有点乏味(您可以在需要停止的地方放置断点)在MARS模拟器中,SPIM工具系列具有相似的功能,对于其他MIPS平台,我不确定这些工具的外观如何;对于常规的MIPS linux + GNU工具链,您当然可以使用gdb,但是学习起来并不那么简单火星,尽管它功能强大且完整).

Using debugger often, trying out every new small piece of code added per single instructions, will make your progress much easier. Searching for some bug or even logical problem in big chunk of new code is often more problematic, then a-bit-tedious nature of debugging every new 4-10 lines of code right after writing them (you can put breakpoints where you need to stop in MARS simulator, and SPIM family of tools has similar features, for other MIPS platforms I'm not sure how the tools look, for regular MIPS linux + GNU toolchain you have certainly gdb available, but it's not as simple to learn as MARS, although it's much more powerful and complete).

从当前源使用无分支延迟时隙的分支判断,您可能正在使用MARS/SPIM模拟器,并且已将延迟分支"选项设置为OFF(在真正的MIPS CPU上,任何命令之后的第一条指令即使跳转是有条件地跳转,跳转仍会执行,因此在真正的MIPS上,您必须在每次跳转后添加nop来抵消这种行为,或者为了以这种方式重组代码以获得最佳性能,以解决这一问题您将分支延迟指令槽用于实际有意义的指令.

Judging by your current source using branches in a way without branch-delay-slot, you are probably working with MARS/SPIM simulator, and you have "delayed branching" option OFF (on the real MIPS CPU the first instruction after any jump is still executed, even if the jump does branch conditionally, so on real MIPS you have to account for that either by adding nop after each jump to neutralize this behaviour, or for best performance to reorganize your code in such way, that you use the branch-delay instruction slot for actual meaningful instruction.

我不喜欢您的代码的一件事是没有根据需要初始化局部变量……例如t4, t2, t3.这将使您的函数最多只能使用一次,因为在第二次尝试期间,寄存器中已经有一些意外的值.也许您为简洁起见而忽略了这些内容,但是在我看来,这就像普通的代码错误一样,这些初始化程序甚至应该是简化的最小化示例代码的一部分,以表明您确实认为您的代码是完整的,并且了解其工作方式(并且它确实需要这些值).

One thing I don't like about your code is not initializing local variables as needed... for example t4, t2, t3. That will make your function usable only once at most, as during second try there will be already some unexpected values in registers. Maybe you left those out for brevity of your question, but in my eyes that's like plain bug of the code, those initializers should be part even of simplified minimized example code, to show that you did think your code through and you understand how it operates (and that it really needs those values).

更多的提示使代码更优化"和更简单:为什么不将运行总和直接保存在v0,v1中,而将乘法结果存储到临时变量中呢?您可以在最后部分避免一举一动.

Some more hints to make the code a bit more "optimal" and simpler: why don't you keep the running sum directly in the v0, v1, and store multiplication result into temporaries instead? You can avoid one move of result in the final part.

您可以简化每次迭代的数组地址计算,可以使用地址+ = 4对其进行更新,而不必每次都执行完整操作(数组+ i * 4)(至少将i换为* 4,好).如果要在循环之前计算结束地址,则可以将整个循环条件构建为地址的bne.

And you can simplify array address calculation every iteration, you can use address += 4 to update it, no need to do full (array + i*4) every time (at least you shifted that i for *4, good). If you would calculate end address ahead of loop, you can then build the whole loop condition as bne of addresses.

您的注释中有很多错别字,例如"32字节"而不是"32位"等.而且我会使用更明确的标签,因为循环"可能会在更大的代码中与任何其他循环"发生冲突.

You have many typos in your comments, for example "32 bytes" instead of "32 bits", and similar. And I would use more explicit labels, because "loop" will probably clash with any other "loop" in somewhat larger code.

出于娱乐目的,我尝试按照自己的提示进行操作,然后将代码重新编写为我的口味",这就是结果(在MARS中尝试过,延迟分支"处于OFF状态,以检查生成的v0:v1值是否放在断点后)每个jal),还可以解决溢出情况:

For fun I tried to follow my hints myself, and rewrite the code more to "my taste", this is the result (tried in MARS, "delayed branching" OFF, to check resulting v0:v1 value put a breakpoint after each jal), also fixing the overflow situation:

main:   # test the subroutine
        la      $a0, testArr
        li      $a1, 4
        jal     squareSum
        # v0:v1 = 14 (0*0 + 1*1 + 2*2 + 3*3)

        # more complex input, testing 64 bit results and overflow
        la      $a0, testArr
        li      $a1, 7
        jal     squareSum

        # terminate app
        li      $v0, 10
        syscall

# Function to calculate squared sum
#
# Inputs:
#       $a0: address of word array in memory (*a)
#       $a1: array size (n)
#
# Outputs:
#       $v0: low 32 bits of result
#       $v1: high 32 bits of result
#
# If the array is empty, then the function returns zero.
#

squareSum:
        # result = 0
        addiu   $v0, $zero,0
        addiu   $v1, $zero,0
        # calculate end() pointer of array (for loop condition)
        sll     $a1, $a1, 2     # n * 4
        addu    $a1, $a1, $a0   # a1 = array.end() address (a0 = array.begin())
        beq     $a0, $a1, squareSum_exit    # begin() == end() => empty array
squareSum_summing:
        # load next array element and calculate it's square
        lw      $t0, 0($a0)     # t0 = array[i]
        addiu   $a0, $a0, 4     # advance the array pointer
        multu   $t0, $t0        # array[i] * array[i]
        mflo    $t0             # t0 = 32 least significant bits of multiplication
        mfhi    $t1             # t1 = 32 most significant bits of multiplication
        # add square value to the result
        addu    $v0, $v0, $t0
        addu    $v1, $v1, $t1
        # handle unsigned addition overflow
        sltu    $t1, $v0, $t0   # t1 = 0/1 correction ((x+y) < y)
        addu    $v1, $v1, $t1   # add correction to the result
        # loop while array_ptr != array.end()
        bne     $a0, $a1, squareSum_summing
squareSum_exit:
        jr      $ra

.data
testArr:  .word   0, 1, 2, 3, 65535, 1024, 1048576

这篇关于Mips装配中的平方和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆