确定字符是否属于一组已知字符的最快方法C ++ [英] Fastest Way to Determine if Character Belongs to a Set of Known Characters C++

查看:217
本文介绍了确定字符是否属于一组已知字符的最快方法C ++的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定任何字符,我确定该字符是否属于已知字符的集合(而不是容器类型)的最快方法是什么。

Given any character, what's the fastest way for me to determine if that character belongs to a set (not the container-type) of known characters.

在其他词语,实现条件的最快优雅方式是什么:

char c ='a';

if(c == ch1 || c == ch2 || c == ch3 ...)//做某事...

In other words, what's the fastest elegant way to implement the conditional:
char c = 'a';
if(c == ch1 || c == ch2 || c == ch3 ...) // Do something...

是否有一个STL容器(我认为它可能是unordered_set?),我可以只传递该字符作为键,如果键存在,它将返回true

Is there an STL container (I'm thinking it might be unordered_set?) that I can just pass the character as a key to and it'll return true if the key exists?

任何具有O(1)查找时间的东西都会为我工作。

Anything with an O(1) lookup time will work for me.

推荐答案

并写了两个版本,一个基于查找数组,另一个使用底层哈希集合。

I went a little further and wrote two versions, one based on a lookup array, the other on a set using an underlying hash.

class CharLookup {
public:
  CharLookup(const std::string & set) : lookup(*std::max_element(set.begin(), set.end()) + 1) {
    for ( auto c : set) lookup[c] = true;
  }
  inline bool has(const unsigned char c) const {
    return c > lookup.size() ? false : lookup[c];
  }
private:
  std::vector<bool> lookup;
};

class CharSet {
public:
  CharSet(const std::string & cset) {
    for ( auto c : cset) set.insert(c);
  }
  inline bool has(const unsigned char c) const {
    return set.contains(c);
  }
private:
  QSet<unsigned char> set;
};

然后写了一个基准,添加了几个容器,越低越好,数据点为字符集大小/文字大小:

Then wrote a little benchmark, added a few more containers for the sake of comparison. Lower is better, the data points are for "character set size / text size":

看起来像短字符集和文本, std :: string :: find_first_of 是最快的,甚至比使用查找数组更快,但随着测试大小的增加而迅速减少。 std :: vector< bool> 看起来像金色的意思, QBitArray 可能有一点不同的实现,当测试大小增加时,在最大测试 QVector< bool> 是最快的,可能是因为它没有位访问的开销。两个哈希集是关闭的,交易场所,最后和最少有 std :: set

Seems like for short char sets and text, std::string::find_first_of is fastest, even faster than using a lookup array, but quickly dwindles as the test size increases. std::vector<bool> seems like the "golden mean", QBitArray probably has a little different implementation because it pulls ahead as the test size increases, at the largest test QVector<bool> is fastest, presumably because it doesn't have the overhead of bit access. The two hash sets are close, trading places, last and least there is the std::set.

测试在i7-3770k Win7 x64框上,使用带有-O3的MinGW 4.9.1 x32。

Tested on an i7-3770k Win7 x64 box, using MinGW 4.9.1 x32 with -O3.

这篇关于确定字符是否属于一组已知字符的最快方法C ++的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆