如何在一个时钟周期内获得32位输入的平方根? [英] How to get a square root for 32 bit input in one clock cycle only?

查看:329
本文介绍了如何在一个时钟周期内获得32位输入的平方根?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在Verilog中设计一个可合成模块,在计算32位给定输入的平方根时只需一个周期。

I want to design a synthesizable module in Verilog which will take only one cycle in calculating square root of given input of 32 bit.

推荐答案

[Edit1]修复代码

最近发现结果哪里关闭即使测试确定一切都没问题所以我深入挖掘并找到我的方程中有一个愚蠢的错误,由于名字与我的pgm环境冲突,测试得到了误报,所以我之前忽略了它。现在它适用于所有情况。

Recently found the results where off even if tests determine all was OK so I dig deeper and found out that I had a silly bug in my equation and due to name conflicts with my pgm environment the tests got false positives so I overlooked it before. Now it work in all cases as it should.

我能想到的最好的事情(除了近似或大 LUT 二进制搜索没有乘法,这里 C ++ 代码:

The best thing I can think of (except approximation or large LUT) is binary search without multiplication, here C++ code:

//---------------------------------------------------------------------------
WORD u32_sqrt(DWORD xx) // 16 T
    {
    DWORD x,m,a0,a1,i;
    const DWORD lut[16]=
        {
        //     m*m
        0x40000000,
        0x10000000,
        0x04000000,
        0x01000000,
        0x00400000,
        0x00100000,
        0x00040000,
        0x00010000,
        0x00004000,
        0x00001000,
        0x00000400,
        0x00000100,
        0x00000040,
        0x00000010,
        0x00000004,
        0x00000001,
        };
    for (x=0,a0=0,m=0x8000,i=0;m;m>>=1,i++)
        {
        a1=a0+lut[i]+(x<<(16-i));
        if (a1<=xx) { a0=a1; x|=m; }
        }
    return x;
    }
//---------------------------------------------------------------------------

标准二进制搜索 sqrt(xx)设置 x MSB LSB ,以便 x * x< = xx 的结果。幸运的是,我们可以通过简单地将事物重写为递增乘法来避免乘法...在每次迭代中,较旧的 x * x 结果可以像这样使用:

Standard binary search sqrt(xx) is setting bits of x from MSB to LSB so that result of x*x <= xx. Luckily we can avoid the multiplication by simply rewrite the thing as incrementing multiplicant... in each iteration the older x*x result can be used like this:

x1 = x0+m
x1*x1 = (x0+m)*(x0+m) = (x0*x0) + (2*m*x0) + (m*m)

其中 x0 是上次迭代时 x 的值, x1 是实际值。 m 是实际处理位的权重。 (2 * m)(m * m)是常量,可以用作 LUT 和位移,所以不需要乘法。只需要添加。遗憾的是,迭代绑定到顺序计算禁止并行,因此结果最好是 16T

Where x0 is value of x from last iteration and x1 is actual value. The m is weight of actual processed bit. The (2*m) and (m*m) are constant and can be used as LUT and bit-shift so no need to multiply. Only addition is needed. Sadly the iteration is bound to sequential computation forbid paralelisation so the result is 16T at best.

在代码中 a0 表示最后 x * x a1 表示实际迭代 x * x

In the code a0 represents last x*x and a1 represents actual iterated x*x

正如您所见, sqrt 是完成 16 x(BitShiftLeft,BitShiftRight,OR,Plus,Compare)其中位移和 LUT 可以硬连线。

As you can see the sqrt is done in 16 x (BitShiftLeft,BitShiftRight,OR,Plus,Compare) where the bit shift and LUT can be hardwired.

如果你有超高速门,你可以将输入时钟乘以 16 并将其用作内部时序用于 SQRT 模块。类似于旧时英特尔 CPU / MCU 中的 MC 时钟作为源 CPU 时钟分区的旧时代...这种方式你可以得到 1T 时间(或者它的倍数取决于乘法比率)。

If you got super fast gates for this in comparison to the rest you can multiply the input clock by 16 and use that as internal timing for SQRT module. Something similar to the old days when there was MC clock as Division of source CPU clock in old Intel CPU/MCUs ... This way you can get 1T timing (or multiple of it depends on the multiplication ratio).

这篇关于如何在一个时钟周期内获得32位输入的平方根?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆