优化经常用字谜功能 [英] Optimizing very often used anagram function
问题描述
我写一个函数,确定两个单词是否字谜。字 A是字B的字谜,如果你可以通过重新构建字B出A的 字母,例如:
I have written a function that determines whether two words are anagrams. Word A is an anagram of word B if you can build word B out of A just by rearranging the letters, e.g.:
lead is anagram of deal
这是我的功能:
bool is_anagram(std::string const & s1, std::string const & s2)
{
auto check = [](std::string const & x)
{
std::map<char, unsigned> counter;
for(auto const & c : x)
{
auto it = counter.find(c);
if(it == counter.end())
counter[c] = 1;
else
++counter[c];
}
return counter;
};
return check(s1) == check(s2);
}
这工作得很好,但作为单词数增加(和使用此功能 在我的应用程序数百万次),它很快就成为了一个主要 瓶颈我的应用程序。
This works fine, but as the number of words increases (and this function is used several million times within my application), it very soon became a major bottleneck of my application.
有没有人对如何加快这个功能了一个想法?
Does anyone have an idea of how to speed this function up?
推荐答案
地图创建和您的来电的std ::地图::找到
的迭代中,
是相当昂贵的。
The map creation and your call to std::map::find
within the iteration,
is quite expensive.
在这种情况下,
您可以使用一个事实,即的std ::在许多方面就像一个字符串
行为
的std ::矢量&lt;焦炭&GT;
,这意味着您可以使用排序的std ::排序
:
In this case,
you can use the fact that std::string
behaves in many regards like a
std::vector<char>
, meaning that you can sort it using std::sort
:
bool is_anagram(std::string s1, std::string s2)
{
std::sort(s1.begin(), s1.end());
std::sort(s2.begin(), s2.end());
return s1 == s2;
}
相反,正在创建的,我走的是字符串的副本两个地图 (按价值计算,而不是常量引用传递它们),并整理他们,所以
Instead of the two maps that you are creating, I am taking a copy of the strings (passing them by value instead of const reference) and sorting them, so
sort("lead") => "adel"
sort("deal") => "adel"
这个变化应该已经被相当多的加快你的算法了。多一个 如果你喜欢比较随意,可能会极大地影响性能的事 也就是说:
This change should already speed your algorithm up by quite a bit. One more thing that may greatly affect the performance if you tend to compare arbitrary words:
bool is_anagram(std::string s1, std::string s2)
{
if(s1.length() != s2.length())
return false;
/* as above */
}
如果两个字符串的长度不同,它显然不能是一个字谜。
的std ::字符串::长度()
是一个非常快的操作(它需要储存
字符串大小反正),所以我们为我们节省了 O(N *日志(N))
从喧嚣
排序的两个字符串。
If the length of the two strings differs, it obviously cannot be an anagram.
std::string::length()
is a very fast operation (it needs to store the
string size anyway), so we save us the hustle of O(N*log(N))
from
sorting the two strings.
这篇关于优化经常用字谜功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!