将两个64位整数乘以128位然后>>的最快方法到64位? [英] Fastest way to multiply two 64-bit ints to 128-bit then >> to 64-bit?

查看:157
本文介绍了将两个64位整数乘以128位然后>>的最快方法到64位?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将两个带符号的64位整数 a b 相乘,然后移位(128- bit)结果为带符号的64位整数。最快的方法是什么?

I need to multiply two signed 64-bit integers a and b together, then shift the (128-bit) result to a signed 64-bit integer. What's the fastest way to do that?

我的64位整数实际上代表定点数,其中 fmt 小数位。选择 fmt ,以便 a * b>> fmt 不应该溢出,例如 abs(a)< 64<<<< fmt abs(b)< 2<< fmt fmt == 56 永远不会以64位溢出,因为最终结果将是< ; 128<<< fmt 因此适合int64。

My 64-bit integers actually represent fixed-point numbers with fmt fractional bits. fmt is chosen so that a * b >> fmt should not overflow, for instance abs(a) < 64<<fmt and abs(b) < 2<<fmt with fmt==56 would never overflow in 64-bits as the final result would be < 128<<fmt and therefore fit in an int64.

我想这样做的原因是快速准确地评估五次多项式形式((((c5 * x + c4)* x + c3)* x + c2)* x + c1)* x + c0 采用定点格式,每个数字都是一个带符号的64位定点数,其中 fmt 小数位。我正在寻找实现这一目标的最有效方法。

The reason I want to do that is to quickly and precisely evaluate quintic polynomials of the form ((((c5*x + c4)*x + c3)*x + c2)*x + c1)*x + c0 in fixed point format, with every number a signed 64-bit fixed-point number with fmt fractional bits. I'm looking for the most efficient way to achieve that.

推荐答案

作为该问题的评论者指出,这是最通过依赖于机器的代码而不是通过可移植代码有效地实现。提问者声明主平台是x86_64,它有一个内置指令,用于执行64✕64→128位乘法。使用一小块内联组件可以轻松访问。请注意,内联汇编的细节可能与编译器有所不同,下面的代码是使用英特尔C / C ++编译器构建的。

As a commenter on the question pointed out, this is most easily accomplished efficiently by machine-dependent code, rather than by portable code. The asker states that the main platform is x86_64, and that has a built-in instruction for performing 64 ✕ 64 → 128 bit multiplication. This is easily accessed using a small piece of inline assembly. Note that details of inline assembly may differ somewhat with compiler, the code below was built with the Intel C/C++ compiler.

#include <stdint.h>

/* compute mul_wide (a, b) >> s, for s in [0,63] */
int64_t mulshift (int64_t a, int64_t b, int s)
{
    int64_t res;
    __asm__ (
        "movq  %1, %%rax;\n\t"          // rax = a
        "movl  %3, %%ecx;\n\t"          // ecx = s
        "imulq %2;\n\t"                 // rdx:rax = a * b
        "shrdq %%cl, %%rdx, %%rax;\n\t" // rax = int64_t (rdx:rax >> s)
        "movq  %%rax, %0;\n\t"          // res = rax
        : "=rm" (res)
        : "rm"(a), "rm"(b), "rm"(s)
        : "%rax", "%rdx", "%ecx");
    return res;
}

与上述代码等效的便携式C99如下所示。我已经针对内联汇编版本进行了广泛测试,没有发现不匹配。

A portable C99 equivalent to the above code is shown below. I have tested this extensively against the inline assembly version and no mismatches were found.

void umul64wide (uint64_t a, uint64_t b, uint64_t *hi, uint64_t *lo)
{
    uint64_t a_lo = (uint64_t)(uint32_t)a;
    uint64_t a_hi = a >> 32;
    uint64_t b_lo = (uint64_t)(uint32_t)b;
    uint64_t b_hi = b >> 32;

    uint64_t p0 = a_lo * b_lo;
    uint64_t p1 = a_lo * b_hi;
    uint64_t p2 = a_hi * b_lo;
    uint64_t p3 = a_hi * b_hi;

    uint32_t cy = (uint32_t)(((p0 >> 32) + (uint32_t)p1 + (uint32_t)p2) >> 32);

    *lo = p0 + (p1 << 32) + (p2 << 32);
    *hi = p3 + (p1 >> 32) + (p2 >> 32) + cy;
}

void mul64wide (int64_t a, int64_t b, int64_t *hi, int64_t *lo)
{
    umul64wide ((uint64_t)a, (uint64_t)b, (uint64_t *)hi, (uint64_t *)lo);
    if (a < 0LL) *hi -= b;
    if (b < 0LL) *hi -= a;
}

/* compute mul_wide (a, b) >> s, for s in [0,63] */
int64_t mulshift (int64_t a, int64_t b, int s)
{
    int64_t res;
    int64_t hi, lo;
    mul64wide (a, b, &hi, &lo);
    if (s) {
        res = ((uint64_t)hi << (64 - s)) | ((uint64_t)lo >> s);
    } else {
        res = lo;
    }
    return res;
}

这篇关于将两个64位整数乘以128位然后&gt;&gt;的最快方法到64位?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆