MIPS32 上 32 位整数数组的 64 位平方和 [英] 64-bit sum of squares of an array of 32-bit integers on MIPS32

查看:62
本文介绍了MIPS32 上 32 位整数数组的 64 位平方和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须在 Mips 程序集中计算数组的平方和.我正在寻找对我的代码的反馈.下面发布的代码只是一个开始,没有考虑可能的溢出情况.但是,我只是想确保这个基本代码块按预期工作

I have to calculate the sum of squares of an array in Mips assembly. I was looking for feedback for my code. The code posted below is just a start and doesn't take into account possible overflow situations. However, I just want to make sure, this basic block of code works as expected

# Function to calculate squared sum
# 
# Inputs:
#       $a0: address of array in memory (*a)
#       $a1: array size (n)
#
# Outputs:
#       $v0: low 32 bytes of result
#       $v1: high 32 bytes of result
#
# If the array is empty, then the function returns zero.
#

squareSum:



    Loop:

        sltu $t5,$t4,$a1    #   if(i<n)
        beq $t5,$zero,Exit  #   if ($t5 == 0)



        sll $t0,$t4,2   #   προσωρινος καταχωρητης 4 * i
        addu $t0,$t0,$a0    #   ο καταχωρητης $t0 δειχνει στη διευθυνση μνημης του array προστιθεμενο με το 4 *  i, αρα εχουμε παρει το array[i]
        lw $t1,0($t0)   #   φορτωση στον καταχωρητη $t1 της τιμη του πινακα που θελουμε καθε στιγμη
        multu $t1,$t1   #   array[i] * array[i]
        mflo $v0    #   32 least significant bits of multiplication to $v0
        mfhi $v1    #   32 most significant bits of multiplication to $v1

        addu $t2,$t2,$v0
        addu $t3,$t3,$v1

        addiu $v0,$zero,0
        addiu $v1,$zero,0

        addiu $t4,$t4,1 #   i += 1

        j Loop


        Exit:
            add $v0,$zero,$t2
            add $v1,$zero,$t3
            jr $ra

我不确定这是我必须如何处理这个乘法的 lo 和 hi 所以我想听听一些建议和技巧

I am not sure this is how I have to handle the lo and hi of this multiplication so I want to hear some suggestions and tips

推荐答案

我不确定这是我处理乘法的 lo 和 hi 的方式

I am not sure this is how I have to handle the lo and hi of this multiplication

当您不确定时,请准备一小段代码来执行您不完全确定的指令,使用不同的输入值,并使用调试器逐步完成并消除任何混淆.

When you are not sure, prepare short piece of code exercising the instructions you are not completely sure, with different input values, and use debugger to step through them and clear up any confusion.

你目前在乘法后的 lo/hi 用法在我看来是正确的,在 MARS 模拟器中按预期工作.

Your current usage of lo/hi after multiplication looks correct to me, works as expected in MARS simulator.

经常使用调试器,尝试每条指令添加的每一小段新代码,将使您的进步更容易.在一大段新代码中搜索一些错误甚至逻辑问题通常更成问题,然后在编写新代码后立即调试每 4-10 行代码有点乏味(您可以在需要停止的地方放置断点)在 MARS 模拟器中,SPIM 系列工具具有类似的功能,对于其他 MIPS 平台,我不确定这些工具的外观,对于常规 MIPS linux + GNU 工具链,您肯定有 gdb 可用,但它不是和 MARS 一样简单易学,但它更强大和更完整).

Using debugger often, trying out every new small piece of code added per single instructions, will make your progress much easier. Searching for some bug or even logical problem in big chunk of new code is often more problematic, then a-bit-tedious nature of debugging every new 4-10 lines of code right after writing them (you can put breakpoints where you need to stop in MARS simulator, and SPIM family of tools has similar features, for other MIPS platforms I'm not sure how the tools look, for regular MIPS linux + GNU toolchain you have certainly gdb available, but it's not as simple to learn as MARS, although it's much more powerful and complete).

根据您当前使用分支的方式判断,没有分支延迟槽,您可能正在使用 MARS/SPIM 模拟器,并且您已关闭延迟分支"选项(在真正的 MIPS CPU 上,任何指令之后的第一条指令跳转仍然会执行,即使跳转确实有条件地分支,所以在真正的 MIPS 上,您必须通过在每次跳转后添加 nop 来消除这种行为,或者为了获得最佳性能来重新组织代码这样,您就可以将分支延迟指令槽用于实际有意义的指令.

Judging by your current source using branches in a way without branch-delay-slot, you are probably working with MARS/SPIM simulator, and you have "delayed branching" option OFF (on the real MIPS CPU the first instruction after any jump is still executed, even if the jump does branch conditionally, so on real MIPS you have to account for that either by adding nop after each jump to neutralize this behaviour, or for best performance to reorganize your code in such way, that you use the branch-delay instruction slot for actual meaningful instruction.

我不喜欢你的代码的一件事是没有根据需要初始化局部变量......例如 t4, t2, t3.这将使您的函数最多只能使用一次,因为在第二次尝试期间,寄存器中已经存在一些意外值.也许您为了简洁起见而忽略了这些,但在我看来,这就像代码的简单错误,这些初始化程序甚至应该是简化的最小化示例代码的一部分,以表明您确实仔细考虑了您的代码并了解它是如何运行的(而且它确实需要这些值).

One thing I don't like about your code is not initializing local variables as needed... for example t4, t2, t3. That will make your function usable only once at most, as during second try there will be already some unexpected values in registers. Maybe you left those out for brevity of your question, but in my eyes that's like plain bug of the code, those initializers should be part even of simplified minimized example code, to show that you did think your code through and you understand how it operates (and that it really needs those values).

还有一些提示可以使代码更优化"和更简单:为什么不将运行总和直接保存在 v0、v1 中,而是将乘法结果存储到临时变量中?你可以在最后部分避免结果的一步.

Some more hints to make the code a bit more "optimal" and simpler: why don't you keep the running sum directly in the v0, v1, and store multiplication result into temporaries instead? You can avoid one move of result in the final part.

而且你可以简化每次迭代的数组地址计算,你可以用address += 4来更新它,不需要每次都做full (array + i*4)(至少你把i移到了*4,好).如果要在循环之前计算结束地址,则可以将整个循环条件构建为地址的 bne.

And you can simplify array address calculation every iteration, you can use address += 4 to update it, no need to do full (array + i*4) every time (at least you shifted that i for *4, good). If you would calculate end address ahead of loop, you can then build the whole loop condition as bne of addresses.

您的评论中有很多错别字,例如32 字节"而不是32 位"等.我会使用更明确的标签,因为循环"可能会在更大的代码中与任何其他循环"发生冲突.

You have many typos in your comments, for example "32 bytes" instead of "32 bits", and similar. And I would use more explicit labels, because "loop" will probably clash with any other "loop" in somewhat larger code.

为了好玩,我尝试自己按照提示进行,并根据我的口味"重新编写代码,结果如下(在 MARS 中尝试,延迟分支"关闭,以检查结果 v0:v1 value 在每个 jal 之后放置一个断点,也修复了溢出的情况:

For fun I tried to follow my hints myself, and rewrite the code more to "my taste", this is the result (tried in MARS, "delayed branching" OFF, to check resulting v0:v1 value put a breakpoint after each jal), also fixing the overflow situation:

main:   # test the subroutine
        la      $a0, testArr
        li      $a1, 4
        jal     squareSum
        # v0:v1 = 14 (0*0 + 1*1 + 2*2 + 3*3)

        # more complex input, testing 64 bit results and overflow
        la      $a0, testArr
        li      $a1, 7
        jal     squareSum

        # terminate app
        li      $v0, 10
        syscall

# Function to calculate squared sum
#
# Inputs:
#       $a0: address of word array in memory (*a)
#       $a1: array size (n)
#
# Outputs:
#       $v0: low 32 bits of result
#       $v1: high 32 bits of result
#
# If the array is empty, then the function returns zero.
#

squareSum:
        # result = 0
        addiu   $v0, $zero,0
        addiu   $v1, $zero,0
        # calculate end() pointer of array (for loop condition)
        sll     $a1, $a1, 2     # n * 4
        addu    $a1, $a1, $a0   # a1 = array.end() address (a0 = array.begin())
        beq     $a0, $a1, squareSum_exit    # begin() == end() => empty array
squareSum_summing:
        # load next array element and calculate it's square
        lw      $t0, 0($a0)     # t0 = array[i]
        addiu   $a0, $a0, 4     # advance the array pointer
        multu   $t0, $t0        # array[i] * array[i]
        mflo    $t0             # t0 = 32 least significant bits of multiplication
        mfhi    $t1             # t1 = 32 most significant bits of multiplication
        # add square value to the result
        addu    $v0, $v0, $t0
        addu    $v1, $v1, $t1
        # handle unsigned addition overflow
        sltu    $t1, $v0, $t0   # t1 = 0/1 correction ((x+y) < y)
        addu    $v1, $v1, $t1   # add correction to the result
        # loop while array_ptr != array.end()
        bne     $a0, $a1, squareSum_summing
squareSum_exit:
        jr      $ra

.data
testArr:  .word   0, 1, 2, 3, 65535, 1024, 1048576

这篇关于MIPS32 上 32 位整数数组的 64 位平方和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆