确定字符是否属于一组已知字符的最快方法C ++ [英] Fastest Way to Determine if Character Belongs to a Set of Known Characters C++
问题描述
给定任何字符,我确定该字符是否属于已知字符的集合(而不是容器类型)的最快方法是什么。
Given any character, what's the fastest way for me to determine if that character belongs to a set (not the container-type) of known characters.
在其他词语,实现条件的最快优雅方式是什么:
char c ='a';
if(c == ch1 || c == ch2 || c == ch3 ...)//做某事...
In other words, what's the fastest elegant way to implement the conditional:
char c = 'a';
if(c == ch1 || c == ch2 || c == ch3 ...) // Do something...
是否有一个STL容器(我认为它可能是unordered_set?),我可以只传递该字符作为键,如果键存在,它将返回true
Is there an STL container (I'm thinking it might be unordered_set?) that I can just pass the character as a key to and it'll return true if the key exists?
任何具有O(1)查找时间的东西都会为我工作。
Anything with an O(1) lookup time will work for me.
推荐答案
并写了两个版本,一个基于查找数组,另一个使用底层哈希集合。
I went a little further and wrote two versions, one based on a lookup array, the other on a set using an underlying hash.
class CharLookup {
public:
CharLookup(const std::string & set) : lookup(*std::max_element(set.begin(), set.end()) + 1) {
for ( auto c : set) lookup[c] = true;
}
inline bool has(const unsigned char c) const {
return c > lookup.size() ? false : lookup[c];
}
private:
std::vector<bool> lookup;
};
class CharSet {
public:
CharSet(const std::string & cset) {
for ( auto c : cset) set.insert(c);
}
inline bool has(const unsigned char c) const {
return set.contains(c);
}
private:
QSet<unsigned char> set;
};
然后写了一个基准,添加了几个容器,越低越好,数据点为字符集大小/文字大小:
Then wrote a little benchmark, added a few more containers for the sake of comparison. Lower is better, the data points are for "character set size / text size":
看起来像短字符集和文本, std :: string :: find_first_of
是最快的,甚至比使用查找数组更快,但随着测试大小的增加而迅速减少。 std :: vector< bool>
看起来像金色的意思, QBitArray
可能有一点不同的实现,当测试大小增加时,在最大测试 QVector< bool>
是最快的,可能是因为它没有位访问的开销。两个哈希集是关闭的,交易场所,最后和最少有 std :: set
。
Seems like for short char sets and text, std::string::find_first_of
is fastest, even faster than using a lookup array, but quickly dwindles as the test size increases. std::vector<bool>
seems like the "golden mean", QBitArray
probably has a little different implementation because it pulls ahead as the test size increases, at the largest test QVector<bool>
is fastest, presumably because it doesn't have the overhead of bit access. The two hash sets are close, trading places, last and least there is the std::set
.
测试在i7-3770k Win7 x64框上,使用带有-O3的MinGW 4.9.1 x32。
Tested on an i7-3770k Win7 x64 box, using MinGW 4.9.1 x32 with -O3.
这篇关于确定字符是否属于一组已知字符的最快方法C ++的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!