比较c中字节数组中的任意位序列 [英] Comparing arbitrary bit sequences in a byte array in c

查看:111
本文介绍了比较c中字节数组中的任意位序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的C代码中有两个uint8_t数组,我想将任意一个序列位与另一个序列进行比较.例如,我有bitarray_1和bitarray_2,我想将bitarray_1的13-47位与bitarray_2的5-39位进行比较.最有效的方法是什么?

当前,这是我程序中的一个巨大瓶颈,因为我只有一个幼稚的实现,可以将这些位复制到新的临时数组的开头,然后对它们使用memcmp.

解决方案

三个词:shift,mask和xor.

移位以获得两个位数组的相同内存对齐方式.如果不是,则在比较它们之前必须先移动其中一个数组.您的示例可能会引起误解,因为位13-47和5-39在8位地址上具有相同的内存对齐方式.如果您将14-48位与5-39位进行比较,那将是不正确的.

一旦所有内容对齐并且超出表边界清除的位,则xor足以立即执行所有位的比较.基本上,您可以通过为每个阵列读取一个内存来做到这一点,这应该非常有效.

如果两个数组的内存对齐方式与示例memcmp中的相同,并且上限和下限的特殊情况可能更快.

通过uint32_t(或在64位体系结构上的uint64_t)访问数组也应该比通过uint8_t访问数组更有效.

原理很简单,但是正如Andrejs所说的那样,实现起来并非没有痛苦...

这是怎么回事(与@caf提案的相似之处并非巧合):

/* compare_bit_sequence() */
int compare_bit_sequence(uint8_t s1[], unsigned s1_off, uint8_t s2[], unsigned s2_off,
    unsigned length)
{
const uint8_t mask_lo_bits[] =
    { 0x00, 0x01, 0x03, 0x07, 0x0f, 0x1f, 0x3f, 0x7f, 0xff };
const uint8_t clear_lo_bits[] =
    { 0xff, 0xfe, 0xfc, 0xf8, 0xf0, 0xe0, 0xc0, 0x80, 0x00 };
uint8_t v1;
uint8_t * max_s1;
unsigned end;
uint8_t lsl;
uint8_t v1_mask;
int delta;

/* Makes sure the offsets are less than 8 bits */
s1 += s1_off >> 3;
s1_off &= 7;

s2 += s2_off >> 3;
s2_off &= 7;

/* Make sure s2 is the sequence with the shorter offset */
if (s2_off > s1_off){
    uint8_t * tmp_s;
    unsigned tmp_off;
    tmp_s = s2; s2 = s1; s1 = tmp_s;
    tmp_off = s2_off; s2_off = s1_off; s1_off = tmp_off;
}
delta = s1_off;

/* handle the beginning, s2 incomplete */ 
if (s2_off > 0){
    delta = s1_off - s2_off;
    v1 = delta
       ? (s1[0] >> delta | s1[1] << (8 - delta)) & clear_lo_bits[delta]
       : s1[0];
       if (length <= 8 - s2_off){
           if ((v1 ^ *s2)
                & clear_lo_bits[s2_off]
                & mask_lo_bits[s2_off + length]){
                return NOT_EQUAL;
           }
           else {
               return EQUAL;
           }
        }
        else{
            if ((v1 ^ *s2) & clear_lo_bits[s2_off]){
                return NOT_EQUAL;
        }
        length -= 8 - s2_off;
    }
    s1++;
    s2++;
}

/* main loop, we test one group of 8 bits of v2 at each loop */
max_s1 = s1 + (length >> 3);
lsl = 8 - delta;
v1_mask = clear_lo_bits[delta];
while (s1 < max_s1)
{
    if ((*s1 >> delta | (*++s1 << lsl & v1_mask)) ^ *s2++)
    {
        return NOT_EQUAL;
    }
}

/* last group of bits v2 incomplete */
end = length & 7;
if (end && ((*s2 ^ *s1 >> delta) & mask_lo_bits[end]))
{
    return NOT_EQUAL;
}

return EQUAL;

}

尚未使用所有可能的优化.一种有前途的方法是使用更大的数据块(一次使用64位或32位而不是8位),您还可以检测到两个数组的偏移都已同步的情况,在这种情况下,请使用memcmp而不是主循环,替换逻辑运算符&的模数%8 7,将"/8"替换为">> 3"等,必须替换代码分支,而不是交换s1和s2等,但主要目的是实现的:每个数组项仅读取一个内存,不写入内存因此,大部分工作都可以在处理器寄存器中进行.

I have a couple uint8_t arrays in my c code, and I'd like to compare an arbitrary sequence bits from one with another. So for example, I have bitarray_1 and bitarray_2, and I'd like to compare bits 13 - 47 from bitarray_1 with bits 5-39 of bitarray_2. What is the most efficient way to do this?

Currently it's a huge bottleneck in my program, since I just have a naive implementation that copies the bits into the beginning of a new temporary array, and then uses memcmp on them.

解决方案

three words: shift, mask and xor.

shift to get the same memory alignment for both bitarray. If not you will have to shift one of the arrays before comparing them. Your exemple is probably misleading because bits 13-47 and 5-39 have the same memory alignment on 8 bits addresses. This wouldn't be true if you were comparing say bits 14-48 with bits 5-39.

Once everything is aligned and exceeding bits cleared for table boundaries a xor is enough to perform the comparison of all the bits at once. Basically you can manage to do it with just one memory read for each array, which should be pretty efficient.

If memory alignment is the same for both arrays as in your example memcmp and special case for upper and lower bound is probably yet faster.

Also accessing array by uint32_t (or uint64_t on 64 bits architectures) should also be more efficient than accessing by uint8_t.

The principle is simple but as Andrejs said the implementation is not painless...

Here is how it goes (similarities with @caf proposal is no coincidence):

/* compare_bit_sequence() */
int compare_bit_sequence(uint8_t s1[], unsigned s1_off, uint8_t s2[], unsigned s2_off,
    unsigned length)
{
const uint8_t mask_lo_bits[] =
    { 0x00, 0x01, 0x03, 0x07, 0x0f, 0x1f, 0x3f, 0x7f, 0xff };
const uint8_t clear_lo_bits[] =
    { 0xff, 0xfe, 0xfc, 0xf8, 0xf0, 0xe0, 0xc0, 0x80, 0x00 };
uint8_t v1;
uint8_t * max_s1;
unsigned end;
uint8_t lsl;
uint8_t v1_mask;
int delta;

/* Makes sure the offsets are less than 8 bits */
s1 += s1_off >> 3;
s1_off &= 7;

s2 += s2_off >> 3;
s2_off &= 7;

/* Make sure s2 is the sequence with the shorter offset */
if (s2_off > s1_off){
    uint8_t * tmp_s;
    unsigned tmp_off;
    tmp_s = s2; s2 = s1; s1 = tmp_s;
    tmp_off = s2_off; s2_off = s1_off; s1_off = tmp_off;
}
delta = s1_off;

/* handle the beginning, s2 incomplete */ 
if (s2_off > 0){
    delta = s1_off - s2_off;
    v1 = delta
       ? (s1[0] >> delta | s1[1] << (8 - delta)) & clear_lo_bits[delta]
       : s1[0];
       if (length <= 8 - s2_off){
           if ((v1 ^ *s2)
                & clear_lo_bits[s2_off]
                & mask_lo_bits[s2_off + length]){
                return NOT_EQUAL;
           }
           else {
               return EQUAL;
           }
        }
        else{
            if ((v1 ^ *s2) & clear_lo_bits[s2_off]){
                return NOT_EQUAL;
        }
        length -= 8 - s2_off;
    }
    s1++;
    s2++;
}

/* main loop, we test one group of 8 bits of v2 at each loop */
max_s1 = s1 + (length >> 3);
lsl = 8 - delta;
v1_mask = clear_lo_bits[delta];
while (s1 < max_s1)
{
    if ((*s1 >> delta | (*++s1 << lsl & v1_mask)) ^ *s2++)
    {
        return NOT_EQUAL;
    }
}

/* last group of bits v2 incomplete */
end = length & 7;
if (end && ((*s2 ^ *s1 >> delta) & mask_lo_bits[end]))
{
    return NOT_EQUAL;
}

return EQUAL;

}

All possible optimisations are not yet used. One promising one would be to use larger chunks of data (64 bits or 32 bits at once instead of 8), you could also detect cases where offset are synchronised for both arrays and in such cases use a memcmp instead of the main loop, replace modulos % 8 by logical operators & 7, replace '/ 8' by '>> 3', etc., have to branches of code instead of swapping s1 and s2, etc, but the main purpose is achieved : only one memory read and not memory write for each array item hence most of the work can take place inside processor registers.

这篇关于比较c中字节数组中的任意位序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆