如何散列一系列键? [英] How do I hash a range of keys?

查看:96
本文介绍了如何散列一系列键?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试做类似的事情:

我有一个整数列表,在0-99之间。

我需要将它们存储在一个数据中结构,然后能够从较低的边界散列到上边界,以尽可能快的时间复杂度访问我需要的那些。

例如,如果我有0 1 2 3 6 8 9我想要访问3到7范围内的那些,它只会返回3和6.

我知道我可以从较低的边界开始并只使用equal_range(int)函数每个值都是范围,但这是最有效的方式吗?

I am trying to do something similar to this:
I have a list of integers, anywhere from 0-99.
I need to store them in a data structure, and then be able to hash from a lower boundary to an upper boundary to access the ones I need in the fastest time complexity possible.
For example, if I had 0 1 2 3 6 8 9 and I wanted to access those in the range of 3 to 7 inclusive, it would simply just return 3 and 6.
I know I could start at the lower boundary and just use the equal_range(int) function on every value is the range, but is that the most efficient way?

推荐答案

维护有序列表的麻烦是插入成本 - O(log N) - 每次向列表中添加内容时。有点矢量会更容易。



std :: bitset几乎可以满足您的需求。它没有一种有效的方法来构建范围掩码。



boost :: flat_map看起来比你想要的更多 - 除非有专门针对bool值的专门化。



这里有一些适合你的直接C ++代码。



The trouble with maintaining an ordered list is the insertion cost - O(log N) - each time your add something to your list. A bit vector would be easier.

std::bitset almost does what you want. It lacks an efficient way to build a mask of ranges.

boost::flat_map looks like it does more than what you want - unless there's a specialization specifically for bool values.

Here's some straight C++ code that should work for you.

#include <memory.h>

#define _countof(arg) ( (sizeof arg) / (sizeof arg[0]) )

class Set
{
    typedef unsigned long long word_t;
    enum {word = 8 * sizeof(word_t), limit = 100};
    word_t mask[ (limit + word - 1) / word];
public:
    Set()
    {
        clear();
    }

    Set(const Set & copy)
    {
        memcpy(mask, copy.mask, sizeof mask);
    }

    void clear()
    {
        memset(mask, 0, sizeof mask);
    }

    bool addKey(unsigned char key)
    {
        if (key < limit)
        {
            word_t bit = 1;
            bit <<= (key % word);
            mask[key / word] |= bit;
            return true;
        }
        return false;
    }

    // range is from low (inclusive) to high (exclusive)
    bool addRange(unsigned char low, unsigned char high)
    {
        if (low < limit && high <= limit && low < high)
        {
            unsigned char low_index = low / word;
            unsigned char high_index = high / word;
            word_t work = (word_t)1 << (low % word);
            work--;
            word_t low_mask = ~work;
            work = (word_t)1 << (high % word);
            work--;
            word_t high_mask = work;
            for (unsigned char i = low_index; i <= high_index; i++)
            {
	        word_t bits = -1;
	        if (i == low_index) bits &= low_mask;
                if (i == high_index) bits &= high_mask;
                    mask[i] = bits;
            }
            return true;
        }
        return false;
    }

    Set& intersect(const Set &right)
    {
         for (unsigned char i = 0; i < _countof(mask); i++)
             mask[i] &= right.mask[i];
         return *this;
    }

    void print() const
    {
        for (unsigned char i = 0; i < _countof(mask); i++)
        {
            if (!mask[i]) continue;
            for (unsigned bit = 0; bit < word; bit++)
            {
                word_t test = 1;
                test <<= bit;
                if (mask[i] & test)
                    printf(" %2d", i * word + bit);
            }
        }
        printf("\n");
    }
};

int main(int argc, _char* argv[])
{
	Set set;

	/*
	For example, if I had 0 1 2 3 6 8 9 and I wanted to access those in the range of 3 to 7 inclusive, it would simply just return 3 and 6.
	*/

    set.addKey(0);
    set.addKey(1);
    set.addKey(2);
    set.addKey(3);
    set.addKey(6);
    set.addKey(8);
    set.addKey(9);

    printf("Set:");
    set.print();

    Set range;

    printf("Range:");
    range.addRange(3, 8);
    range.print();

    printf("Intersect:");
    range.intersect(set);
    range.print();

    getchar();
    return 0;
}
</memory.h>





以上代码使用long long作为单词大小它可以非常快速地扫描所有100位(两个64位访问)。您可以将单词大小更改为long,short或char,以查看对性能的影响。它最多可以工作255位。



The above code uses long long as the word size so it can scan all 100 bits very quickly (two 64 bit accesses). You can change the word size to long, short or char to see the effect on performance. It should work for up to 255 bits.


这篇关于如何散列一系列键?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆