可以用 imul 指令执行多精度有符号乘法吗? [英] can multiprecision signed multiply be performed with imul instruction?

查看:72
本文介绍了可以用 imul 指令执行多精度有符号乘法吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个函数库,为有符号整数类型 s0128s0256s0512 提供所有常规运算符和函数>s1024 和浮点类型 f0128f0256f0512f1024.

I am writing a function library to provide all conventional operators and functions for signed-integer types s0128, s0256, s0512, s1024 and floating-point types f0128, f0256, f0512, f1024.

我现在正在编写 s0128s0256s0512s1024 乘法例程,但我写错了令我困惑的结果.我假设我可以用 64 位 imul rcx 指令(在 rdx:rax 中产生 128 位结果)级联乘法,就像我可以做的一样使用带有 mul rcx 指令的无符号操作数......但 imul 的答案是错误的.

I am writing the s0128, s0256, s0512, s1024 multiply routines now, but am getting erroneous results that confuse me. I assumed I could cascade multiplies with the 64-bit imul rcx instruction (that produces a 128-bit result in rdx:rax) in the same way I could do the same with unsigned operands with the mul rcx instruction... but the answers with imul are wrong.

我怀疑有一些技巧可以使这个工作,也许混合 imulmul 指令 - 或者其他东西.或者有什么原因不能用带符号的乘法指令实现更大的乘法?

I suspect there is some trick to make this work, maybe mix imul and mul instructions - or something. Or is there some reason one cannot implement larger multiplies with signed multiply instructions?

所以你理解了这个技术,我将描述 s0128 操作数的最小版本.

So you understand the technique, I'll describe the smallest version, for s0128 operands.

           arg2.1   arg2.0  : two 64-bit parts of s0128 operand
           arg1.1   arg1.0  : two 64-bit parts of s0128 operand
           ---------------
       0  out.edx  out.eax  : output of arg1.0 * arg2.0
 out.edx  out.eax           : output of arg1.0 * arg2.1
 -------------------------
 out.2    out.1    out.0    : sum the above intermediate results
 out.edx  out.eax           : output of arg1.1 * arg2.0
 -------------------------
 out.2    out.1    out.0    : sum the above intermediate results

每次代码乘以两个 64 位值时,都会在 edx:eax 中生成一个 128 位结果.每次代码生成 128 位结果时,它都会将该结果与 addqadcqadcq 的 64 位寄存器的累加三元组相加> 指令(其中最后的 adcq 指令仅添加零以确保传播任何进位标志).

Each time the code multiplies two 64-bit values, it generates a 128-bit result in edx:eax. Each time the code generates a 128-bit result, it sums that result into an accumulating triple of 64-bit registers with addq, adcq, adcq instructions (where the final adcq instruction only adds zero to assure any carry flags gets propagated).

当我将小的负数乘以小的正数作为测试时,结果是负数,但是在128位中的上位64位值的底部有一两个非零位s0128 结果.这对我来说意味着多精度有符号乘法中的传播不太正确.

When I multiply small negative numbers by small positive numbers as a test, the result is negative, but there are one or two non-zero bits at the bottom of the upper 64-bit value in the 128-bit s0128 result. This implies to me that something isn't quite right with propagation in multiprecision signed multiplies.

当然,s0256s0512s1024 的级联要广泛得多.

Of course the cascade is quite a bit more extensive for s0256, s0512, s1024.

我错过了什么?我是否必须将两个操作数都转换为无符号数,执行无符号乘法,然后在一个(但不是两个)操作数为负数时取反结果?或者我可以使用 imul 有符号乘法指令计算多精度结果吗?

What am I missing? Must I convert both operands to unsigned, perform unsigned multiply, then negate the result if one (but not both) of the operands was negative? Or can I compute multiprecision results with the imul signed multiply instruction?

推荐答案

当你用更小的乘法构建一个扩展精度的有符号乘法时,你最终会得到有符号和无符号算术的混合.

When you build an extended precision signed multiply out of smaller multiplies, you end up with a mixture of signed and unsigned arithmetic.

特别是,如果您将一个有符号值分成两半,则将上半部分视为有符号值,将下半部分视为无符号值.事实上,对于扩展精度加法也是如此.

In particular, if you break a signed value in half, you treat the upper half as signed, and the lower half as unsigned. The same is true for extended precision addition, in fact.

考虑这个任意的例子,其中AHAL分别代表ABH 和 BL 代表 B 的高低半部分.(注意:这些并不意味着代表 x86 寄存器的一半,只是被乘数的一半.)L 项是无符号的,而 H 项是有符号的.

Consider this arbitrary example, where AH and AL represent the high and low halves of A, and BH and BL represent the high and low halves of B. (Note: these aren't meant to represent x86 register halves, just halves of a multiplicand.) The L terms are unsigned and the H terms are signed.

              AH : AL
           x  BH : BL
  -------------------
              AL * BL    unsigned x unsigned => zero extend to full precision
         AH * BL           signed x unsigned => sign extend to full precision
         AL * BH         unsigned x   signed => sign extend to full precision
    AH * BH                signed x   signed

AL * BL 乘积是无符号的,因为 AL 和 BL 都是无符号的.因此,当您将其提升到结果的完全精度时,它会得到零扩展.

The AL * BL product is unsigned because both AL and BL are unsigned. Therefore, it gets zero extended when you promote it to the full precision of the result.

AL * BHAH * BL 产品混合了有符号和无符号值.生成的产品已签名,当您将其提升到结果的完全精度时,需要对其进行签名扩展.

The AL * BH and AH * BL products mix signed and unsigned values. The resulting product is signed, and that needs to be sign extended when you promote it to the full precision of the result.

以下 C 代码演示了以 16×16 乘法实现的 32×32 乘法.同样的原则适用于构建 64×64 乘法中的 128×128 乘法.

The following C code demonstrates a 32×32 multiply implemented in terms of 16×16 multiplies. The same principle applies when building 128×128 multiplies out of 64×64 multiplies.

#include <stdint.h>
#include <stdio.h>

int64_t mul32x32( int32_t x, int32_t y )
{
    int16_t x_hi = 0xFFFF & (x >> 16);
    int16_t y_hi = 0xFFFF & (y >> 16);

    uint16_t x_lo = x & 0xFFFF;
    uint16_t y_lo = y & 0xFFFF;


    uint32_t lo_lo = (uint32_t)x_lo * y_lo;    // unsigned x unsigned
    int32_t  lo_hi = (x_lo * (int32_t)y_hi);   // unsigned x   signed
    int32_t  hi_lo = ((int32_t)x_hi * y_lo);   //   signed x unsigned
    int32_t  hi_hi = ((int32_t)x_hi * y_hi);   //   signed x   signed


    int64_t  prod = lo_lo 
                  + (((int64_t)lo_hi + hi_lo) << 16) 
                  + ((int64_t)hi_hi << 32);

    return prod;
}

int check(int a, int b)
{
    int64_t ref = (int64_t)a * (int64_t)b;
    int64_t tst = mul32x32(a, b);

    if (ref != tst)
    {
        printf("%.8X x %.8X => %.16llX vs %.16llX\n",
                (unsigned int)a,         (unsigned int)b, 
                (unsigned long long)ref, (unsigned long long)tst);
        return 1;
    }

    return 0;
}


int main()
{
    int a = (int)0xABCDEF01;
    int b = (int)0x12345678;
    int c = (int)0x1234EF01;
    int d = (int)0xABCD5678;

    int fail = 0;

    fail += check(a, a);
    fail += check(a, b);
    fail += check(a, c);
    fail += check(a, d);

    fail += check(b, b);
    fail += check(b, c);
    fail += check(b, d);

    fail += check(c, c);
    fail += check(c, d);

    fail += check(d, d);

    printf("%d tests failed\n", fail);
    return 0;
}

即使您将被乘数分成两部分以上,此模式也会扩展.也就是说,只有签名数字中最重要的部分才被视为已签名.所有其他作品均未签名.考虑这个例子,它将每个被乘数分成 3 个部分:

This pattern extends even if you break the multiplicands into more than two pieces. That is, only the most-significant piece of a signed number gets treated as signed. All of the other pieces are unsigned. Consider this example, which divides each multiplicand into 3 pieces:

                      A2 : A1 : A0
                   x  B2 : B1 : B0
  ---------------------------------
                           A0 * B0    => unsigned x unsigned   => zero extend
                      A1 * B0         => unsigned x unsigned   => zero extend
                 A2 * B0              =>   signed x unsigned   => sign extend
                      A0 * B1         => unsigned x unsigned   => zero extend
                 A1 * B1              => unsigned x unsigned   => zero extend
            A2 * B1                   =>   signed x unsigned   => sign extend
                 A0 * B2              => unsigned x   signed   => sign extend
            A1 * B2                   => unsigned x   signed   => sign extend
       A2 * B2                        =>   signed x   signed

由于所有混合签名和符号扩展的乐趣,实现签名的 × 通常更容易.有符号乘法作为无符号 ×无符号乘法,如果被乘数不同,则在最后有条件地否定.(而且,事实上,当您使用扩展精度浮点数时,只要您保持像 IEEE-754 这样的符号幅度形式,您就不必处理有符号乘法.)

Because of all the mixed-signedness and sign extension fun, it's often just easier to implement a signed × signed multiply as an unsigned × unsigned multiply, and conditionally negate at the end if the signs if the multiplicands differ. (And, in fact, when you get to the extended precision float, as long as you stay in sign-magnitude form like IEEE-754, you won't have to deal with signed multiply.)

这个汇编宝石 展示了如何有效地否定扩展精度值.(gems 页面 有点过时,但您可能会觉得它很有趣/有用.)

This assembly gem shows how to negate extended precision values efficiently. (The gems page is a little dated, but you may find it interesting / useful.)

这篇关于可以用 imul 指令执行多精度有符号乘法吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆