写现代的x86硬件的比特流最快方法 [英] fastest way to write a bitstream on modern x86 hardware
问题描述
什么是写在x86 / x86-64的比特流的最快方法? ($ C $命令缓冲区号< = 32位)
What is the fastest way to write a bitstream on x86/x86-64? (codeword <= 32bit)
通过写入的比特流我指的是串联比特长度可变符号分割成连续的存储器缓冲器的过程。
by writing a bitstream I refer to the process of concatenating variable bit-length symbols into a contiguous memory buffer.
目前我有一个32位的中间缓冲标准容器写入
currently I've got a standard container with a 32bit intermediate buffer to write to
void write_bits(SomeContainer<unsigned int>& dst,unsigned int& buffer, unsigned int& bits_left_in_buffer,int codeword, short bits_to_write){
if(bits_to_write < bits_left_in_buffer){
buffer|= codeword << (32-bits_left_in_buffer);
bits_left_in_buffer -= bits_to_write;
}else{
unsigned int full_bits = bits_to_write - bits_left_in_buffer;
unsigned int towrite = buffer|(codeword<<(32-bits_left_in_buffer));
buffer= full_bits ? (codeword >> bits_left_in_buffer) : 0;
dst.push_back(towrite);
bits_left_in_buffer = 32-full_bits;
}
}
有谁知道任何很好的优化,快速的指令或其他信息,可能是有用的?的
Does anyone know of any nice optimizations, fast instructions or other info that may be of use?
干杯,
推荐答案
我写了一次相当快的实现,但它有一些局限性:它适用于32位x86当你写和读的比特流。我不检查缓冲区限制在这里,我被分配更大的缓冲区,并从调用code检查它不时。
I wrote once a quite fast implementation, but it has several limitations: It works on 32 bit x86 when you write and read the bitstream. I don't check for buffer limits here, I was allocating larger buffer and checked it from time to time from the calling code.
unsigned char* membuff;
unsigned bit_pos; // current BIT position in the buffer, so it's max size is 512Mb
// input bit buffer: we'll decode the byte address so that it's even, and the DWORD from that address will surely have at least 17 free bits
inline unsigned int get_bits(unsigned int bit_cnt){ // bit_cnt MUST be in range 0..17
unsigned int byte_offset = bit_pos >> 3;
byte_offset &= ~1; // rounding down by 2.
unsigned int bits = *(unsigned int*)(membuff + byte_offset);
bits >>= bit_pos & 0xF;
bit_pos += bit_cnt;
return bits & BIT_MASKS[bit_cnt];
};
// output buffer, the whole destination should be memset'ed to 0
inline unsigned int put_bits(unsigned int val, unsigned int bit_cnt){
unsigned int byte_offset = bit_pos >> 3;
byte_offset &= ~1;
*(unsigned int*)(membuff + byte_offset) |= val << (bit_pos & 0xf);
bit_pos += bit_cnt;
};
这篇关于写现代的x86硬件的比特流最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!