整数数组的位打包 [英] Bit packing of array of integers

查看:204
本文介绍了整数数组的位打包的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个整数数组,让我们假设他们是类型中的int64_t 。现在,我知道,只有每首每个整数n个位是有意义的(即,我知道他们是由一些边界限制)。

I have an array of integers, lets assume they are of type int64_t. Now, I know that only every first n bits of every integer are meaningful (that is, I know that they are limited by some bounds).

什么是数组转换中,所有不必要的空间被删除的方式(最有效的方式就是我在 A [0] ,第二个是第一个整数逐一 A [0] + n位等)?

What is the most efficient way to convert the array in the way that all unnecessary space is removed (i.e. I have the first integer at a[0], the second one at a[0] + n bits and so on) ?

我想它是一般尽可能多的,因为 N 会时常发生变化的时候,虽然我想有可能是特定的智能优化 N 象2或某事的权力。

I would like it to be general as much as possible, because n would vary from time to time, though I guess there might be smart optimizations for specific n like powers of 2 or sth.

当然,我知道我可以在值只是迭代的价值,我只是想问问你StackOverflowers,如果你能想到的一些更聪明的办法。

Of course I know that I can just iterate value over value, I just want to ask you StackOverflowers if you can think of some more clever way.

编辑:

这问题不是COM pressing阵列采取的最小距离越好。我只需要一刀切 n位从每一个整数,并给予该数组我知道确切的 N 位的我可以安全地切割。

This question is not about compressing the array to take as least space as possible. I just need to "cut" n bits from every integer and given the array I know the exact n of bits I can safely cut.

推荐答案

我同意这一点,你需要使用类似Huffman编码或者可能的Lempel-谢夫-Welch算法keraba。有位打包你所谈论的方式的问题是,你有两种选择:

I agree with keraba that you need to use something like Huffman coding or perhaps the Lempel-Ziv-Welch algorithm. The problem with bit-packing the way you are talking about is that you have two options:


  • 选择一个常数n使得最大整数可以重新presented。

  • 允许n至从价值各不相同的值。

第一个选项是比较容易实现,但真的要浪费大量的空间,除非所有整数相当小。

The first option is relatively easy to implement, but is really going to waste a lot of space unless all integers are rather small.

第二个选项的主要缺点,你必须在输出比特流以某种方式传达n可以随意更改。例如,每个值将必须有与它相关的长度。这意味着你存储两个整数(虽然较小整数),每个输入值。有一个很好的机会,你会增加文件大小用这种方法。

The second option has the major disadvantage that you have to convey changes in n somehow in the output bitstream. For instance, each value will have to have a length associated with it. This means you are storing two integers (albeit smaller integers) for every input value. There's a good chance you'll increase the file size with this method.

霍夫曼或LZW的优点在于,它们以这样一种方式,codeS的长度可以从输出比特流中导出,而无需实际存储的长度创建codebooks。这些技术让你得到非常接近香农极限。

The advantage of Huffman or LZW is that they create codebooks in such a way that the length of the codes can be derived from the output bitstream without actually storing the lengths. These techniques allow you to get very close to the Shannon limit.

我决定给你最初的想法(常数n,删除未使用的比特和包)一个有趣的尝试,这里是我想出了幼稚的做法:

I decided to give your original idea (constant n, remove unused bits and pack) a try for fun and here is the naive implementation I came up with:

#include <sys/types.h>
#include <stdio.h>

int pack(int64_t* input, int nin, void* output, int n)
{
    int64_t inmask = 0;
    unsigned char* pout = (unsigned char*)output;
    int obit = 0;
    int nout = 0;
    *pout = 0;

    for(int i=0; i<nin; i++)
    {
        inmask = (int64_t)1 << (n-1);
        for(int k=0; k<n; k++)
        {
            if(obit>7)
            {
                obit = 0;
                pout++;
                *pout = 0;
            }
            *pout |= (((input[i] & inmask) >> (n-k-1)) << (7-obit));
            inmask >>= 1;
            obit++;
            nout++;
        }
    }
    return nout;
}

int unpack(void* input, int nbitsin, int64_t* output, int n)
{
    unsigned char* pin = (unsigned char*)input;
    int64_t* pout = output;
    int nbits = nbitsin;
    unsigned char inmask = 0x80;
    int inbit = 0;
    int nout = 0;
    while(nbits > 0)
    {
        *pout = 0;
        for(int i=0; i<n; i++)
        {
            if(inbit > 7)
            {
                pin++;
                inbit = 0;
            }
            *pout |= ((int64_t)((*pin & (inmask >> inbit)) >> (7-inbit))) << (n-i-1);
            inbit++;
        }
        pout++;
        nbits -= n;
        nout++;
    }
    return nout;
}

int main()
{
    int64_t input[] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20};
    int64_t output[21];
    unsigned char compressed[21*8];
    int n = 5;

    int nbits = pack(input, 21, compressed, n);
    int nout = unpack(compressed, nbits, output, n);

    for(int i=0; i<=20; i++)
        printf("input: %lld   output: %lld\n", input[i], output[i]);
}

这是非常低效的,因为就是在步一次一位,但那是实现它没有处理字节序的问题,最简单的方法。我没有任何具有广泛的价值,只是在测试的那些测试这一点。此外,不存在边界检查,并假设该输出缓冲器足够长。所以,我的意思是,这code可能是唯一的好教育目的,让你开始。

This is very inefficient because is steps one bit at a time, but that was the easiest way to implement it without dealing with issues of endianess. I have not tested this either with a wide range of values, just the ones in the test. Also, there is no bounds checking and it is assumed the output buffers are long enough. So what I am saying is that this code is probably only good for educational purposes to get you started.

这篇关于整数数组的位打包的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆