什么是最有效的方式来删除重复和排序向量? [英] What's the most efficient way to erase duplicates and sort a vector?
问题描述
我需要一个C ++向量,可能有很多元素,删除重复项,并进行排序。
我目前有以下代码,
vec.erase(
std :: unique(vec.begin(),vec.end )),
vec.end());
std :: sort(vec.begin(),vec.end());
如何正确执行?
另外,首先擦除重复项还是比较快(类似于上面的编码)还是首先执行排序?如果我先执行排序,是否保证在 std :: unique
执行后保持排序。
还是还有另一种(也许更有效率)的方式来做这些?
我同意 R上。 Pate 和 Todd Gardner ; c $ c> >可能是一个好主意。即使你使用向量卡住了,如果你有足够的重复数据,那么你可能会创建一个集合来做这些肮脏的工作。
我们来比较三种方法: / p>
只需使用向量,排序+唯一
sort(vec.begin(),vec.end());
vec.erase(unique(vec.begin(),vec.end()),vec.end());
转换为设置(手动)
set< int> S;
unsigned size = vec.size(); $ unsigned i = 0; i< size; ++ i)
s.insert(vec [i]);
vec.assign(s.begin(),s.end());
转换为设置(使用构造函数)
set< int> s(vec.begin(),vec.end());
vec.assign(s.begin(),s.end());
以下是重复次数更改的效果:
总结:当重复数量足够大时,转换成一个集合然后将数据转回到一个向量实际上更快。
由于某些原因,手动进行设置转换似乎比使用集合构造函数更快 - 至少在我使用的玩具随机数据上。 >
I need to take a C++ vector with potentially a lot of elements, erase duplicates, and sort it.
I currently have the below code, but it doesn't work.
vec.erase(
std::unique(vec.begin(), vec.end()),
vec.end());
std::sort(vec.begin(), vec.end());
How can I correctly do this?
Additionally, is it faster to erase the duplicates first (similar to coded above) or perform the sort first? If I do perform the sort first, is it guaranteed to remain sorted after std::unique
is executed?
Or is there another (perhaps more efficient) way to do all this?
I agree with R. Pate and Todd Gardner; a std::set
might be a good idea here. Even if you're stuck using vectors, if you have enough duplicates, you might be better off creating a set to do the dirty work.
Let's compare three approaches:
Just using vector, sort + unique
sort( vec.begin(), vec.end() );
vec.erase( unique( vec.begin(), vec.end() ), vec.end() );
Convert to set (manually)
set<int> s;
unsigned size = vec.size();
for( unsigned i = 0; i < size; ++i ) s.insert( vec[i] );
vec.assign( s.begin(), s.end() );
Convert to set (using a constructor)
set<int> s( vec.begin(), vec.end() );
vec.assign( s.begin(), s.end() );
Here's how these perform as the number of duplicates changes:
Summary: when the number of duplicates is large enough, it's actually faster to convert to a set and then dump the data back into a vector.
And for some reason, doing the set conversion manually seems to be faster than using the set constructor -- at least on the toy random data that I used.
这篇关于什么是最有效的方式来删除重复和排序向量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!