解压缩16位BCD的最有效公式?(例如0x1234到0x01020304) [英] Most efficient formula for unpacking 16-bit BCD? (e.g. 0x1234 to 0x01020304)

查看:109
本文介绍了解压缩16位BCD的最有效公式?(例如0x1234到0x01020304)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了有效地解压缩16位压缩的BCD号码,是否有一些可笑的方法?

以步行方式进行操作需要10次操作(3个班次,4个AND和3个OR或ADD):

  x =(bcd& 0xF000)<<12|(bcd& 0x0F00)<<8|(bcd& 0x00F0)<<4|(bcd& 0x000F) 

使用多路ADD/OR时,关键路径长度将为3,但是这些操作往往是二进制的,因此大多数CPU都将查找长度为4的关键路径.

这可以更有效地完成吗?

注意:出于某些目的,如果可以特别有效地解压缩某些半字节的置换,例如,如果要解压缩的单词来自一个查找表,我已经对其进行了完整的创建,则同样有用.控制(这样我就可以将每个数字粘贴到我想要的任何位置).在这种情况下,使用压缩的而不是未压缩的BCD的目的是将内存压力减半,并避免超过L1高速缓存的大小,通过增加CPU的ALU的负载来减轻过饱和的内存子系统的负担./p>

例如,如果我置换0x1324之类的数字,那么简单的解交织将产生0x01020304:

  x =((bcd<< 12)| bcd)&0x0F0F0F0F 

这只是三个操作,关键路径长度为3,比原始版本有很大改进...

解决方案

最有效的解决方案是特定于计算机的,因为在处理立即常量或将移位与ALU操作结合使用时,不同的ISA具有不同的功能.这是一种具有良好指令级并行性的替代实现,在具有非常快的整数乘法的平台上,可能会更胜一筹.整数乘法通常通过并行执行多个移位加法运算而对位旋转算法很有帮助.

  #include< stdio.h>#include< stdlib.h>#include< stdint.h>/*参考实现*/uint32_t bcd_spread_1(uint32_t a){return((((a& 0xF000)<< 12)|((a& 0x0F00)<< 8)|((a& 0x00F0)<< 4)|((a& 0x000F)<< 0));}/*替代实现*/uint32_t bcd_spread_2(uint32_t a){返回(((((a& 0xf0f0)* 0x1010)& 0x0f000f00)|((((a& 0x0f0f)* 0x0101)& 0x000f000f));}/* BCD加法.Knuth TAOCP 4 */uint32_t中位数(uint32_t x,uint32_t y,uint32_t z){return(x&(y | z))|(y& z);}uint32_t bcd_add(uint32_t x,uint32_t y){uint32_t z,u,t;z = y + 0x66666666;u = x + z;t =中位数(〜x,〜z,u)&0x88888888;返回u-t +(t>> 2);}int main(无效){uint32_t x,y,bcd = 0;做 {x = bcd_spread_1(bcd);y = bcd_spread_2(bcd);如果(x!= y){printf("!!!! bcd =%04x x =%08x y =%08x \ n",bcd,x,y);返回EXIT_FAILURE;}bcd = bcd_add(bcd,1);} while(bcd< 0x10000);返回EXIT_SUCCESS;} 

Is there a bit twiddling hack for efficiently unpacking a 16-bit packed BCD number?

Doing it the pedestrian way requires 10 operations (3 shifts, 4 ANDs and 3 ORs or ADDs):

x = (bcd & 0xF000) << 12
  | (bcd & 0x0F00) <<  8
  | (bcd & 0x00F0) <<  4
  | (bcd & 0x000F)

With multi-way ADD/OR the critical path length would be 3 but these operations tend to be binary and so most CPUs would be looking at a critical path of length 4.

Can this be done more efficiently?

Note: for some purposes it can be equally useful if some permutation of the nibbles can be unpacked especially efficiently, like if the word to be unpacked comes from a lookup table over whose creation I have full control (so that I can stick each digit wherever I want). The purpose of using packed instead of unpacked BCD in this case would be to halve the memory pressure and to avoid exceeding the size of the L1 cache, taking some load off an over-saturated memory subsystem by increasing the load on the CPU's ALUs.

For example, if I permute the digits like 0x1324 then a simple de-interleave yields 0x01020304:

x = ((bcd << 12) | bcd) & 0x0F0F0F0F

That's just three operations with critical path length 3, quite an improvement over the original version...

解决方案

The most efficient solution will be machine specific, as different ISAs have different capabilities when it comes to dealing with immediate constants, or combining shifts with ALU operations. Here is an alternative implementation with good instruction-level parallelism that may be superior on platforms with a very fast integer multiply. Integer multiply is often helpful for bit twiddling algorithms by performing multiple shift-add operations in parallel.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

/* reference implementation */
uint32_t bcd_spread_1 (uint32_t a)
{
    return (((a & 0xF000) << 12) |
            ((a & 0x0F00) <<  8) |
            ((a & 0x00F0) <<  4) |
            ((a & 0x000F) <<  0));
}

/* alternative implementation */
uint32_t bcd_spread_2 (uint32_t a)
{
    return ((((a & 0xf0f0) * 0x1010) & 0x0f000f00) |
            (((a & 0x0f0f) * 0x0101) & 0x000f000f));
}

/* BCD addition. Knuth TAOCP 4 */
uint32_t median (uint32_t x, uint32_t y, uint32_t z)
{
    return (x & (y | z)) | (y & z);
}

uint32_t bcd_add (uint32_t x, uint32_t y)
{
    uint32_t z, u, t;
    z = y + 0x66666666;
    u = x + z;
    t = median (~x, ~z, u) & 0x88888888;
    return u - t + (t >> 2);
}

int main (void)
{
    uint32_t x, y, bcd = 0;
    do {
        x = bcd_spread_1 (bcd);
        y = bcd_spread_2 (bcd);
        if (x != y) {
            printf ("!!!! bcd=%04x x=%08x y=%08x\n", bcd, x, y);
            return EXIT_FAILURE;
        }
        bcd = bcd_add (bcd, 1);
    } while (bcd < 0x10000);
    return EXIT_SUCCESS;
}

这篇关于解压缩16位BCD的最有效公式?(例如0x1234到0x01020304)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆