使用STL映射并设置大量数据 [英] Using STL map and set on large amount of data

查看:78
本文介绍了使用STL映射并设置大量数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一种有效的方法(时间是我的主要关注点)来生成10个
000 000个独特的16个字符的字母数字字符串。


I使用STL设置和地图但是在大约5 000 000个条目之后,它变得非常慢,即使我的计算机上仍有足够的RAM可用


你有什么建议吗?


这里是我用来确保唯一性的代码示例。


typedef pair< string,bool> StringBool_pair;

map< string,bool> MapValues;

pair< map< string,bool> :: iterator,bool> pr;


for(int iii = 0; iii< 10000000; iii ++){

strCode = random16CharacterCode();

pr = MapValues.insert(StringBool_pair(strCode,true));

if(pr.second == true){

//接受...

}否则{

//拒绝......

}

}


谢谢

解决方案

Eric写道:

我需要一种有效的方式(时间是我的主要关注点这里)生成10个独特的字母数字字符串,每个字符串16个字符。

我使用STL设置和映射但是在大约5 000 000个条目之后,它变得很慢,即使我的计算机上仍有足够的RAM可用

您有什么建议吗?

这里是我用来确保唯一性的代码示例。

typedef pair< string,bool> StringBool_pair;
map< string,bool> MapValues;
对< map< string,bool> :: iterator,bool> pr;

for(int iii = 0; iii< 10000000; iii ++){
strCode = random16CharacterCode();
pr = MapValues.insert(StringBool_pair(strCode,true) );
if(pr.second == true){
//接受......
}否则{
//拒绝......
}
}

谢谢




查看你的编译器是否有hash_set的实现。

非标准,但许多编译器将其作为扩展。例如,你可以在命名空间__gnu_cxx中使用g ++获取



#include< \\ text / hash_set>

# include< string>


int main()

{

__gnu_cxx :: hash_set< std :: string> h;

//这里有趣的事情。

返回0;

}

这应该是一样的接口为std :: set,但使用哈希表

而不是平衡二叉树(通常实现

std :: set),因此给出了预期的O (1)插入和查找,

而不是O(lg n)。


这里有一些关于hash_set的好文档:
http://www.sgi.com/tech/stl/hash_set.html


-Alan


Eric写道:

我需要一种有效的方式(时间是我主要关注的是)生成10个独特的字母数字字符串,每个字符串包含16个字符。

我使用STL设置和地图但是在大约5 000 000个条目之后,它变成了
非常慢,即使我的计算机上仍有足够的RAM可用

你有什么建议吗?




你能否更具体地说明你的琴弦上的任何其他限制?

如果唯一要求是它们是唯一的,那么你可以先列出

第一个10 ^ 7按字母顺序排列,从aaa开始... aaa。


如果他们需要随机,那么你可以做各种鬼鬼祟祟的事情

赢了不会太多地影响随机性(对于大多数用途)。例如,

你可以修复字符串的第一个字符并生成15个随后的

随机字符。为每个不同的角色做10 ^ 7 /(不同字符数)次

。这将使你可以单独使用大量的b $ b b b小型数据集。


其他可能性也存在,它只取决于你的多少$

系统的内置随机性你愿意牺牲。


Eric写道:

I需要一种有效的方法(时间是我的主要关注点)来生成10个独特的字母数字字符串,每个字符串包含16个字符。

我使用STL设置和映射但是在大约5 000 000个条目之后,即使我的计算机上仍有足够的RAM,它也变得很慢

你有什么建议吗?

这里是我用的代码示例确保唯一性。

typedef pair< string,bool> StringBool_pair;
map< string,bool> MapValues;
对< map< string,bool> :: iterator,bool> pr;

for(int iii = 0; iii< 10000000; iii ++){
strCode = random16CharacterCode();
pr = MapValues.insert(StringBool_pair(strCode,true) );
if(pr.second == true){
//接受......
}否则{
//拒绝......
}
}

谢谢




除了其他人提到的想法(例如生成字符串)

deterministicaly使用一个或另一个列表算法,或使用

随机字符串生成器,在一段时间内不会产生重复

大到足以满足您的目的)你可能想尝试一个线索。这是一个用于存储字符串集的数据结构。它具有以下

属性:插入新字符串需要的时间与字符串的长度成比例(而不是字符串中的字符串数);测试是否

或者某个给定字符串已经是该集合的成员需要时间

与字符串的长度成正比(而不是

套装。


那么什么是特里?该名称应该来自reTRIEval。假设

你有一组字符串{HIS,HERS,HIM,HE}然后trie

看起来像(ascii art coming,希望你有一个等宽字体):


+

|

+ -H - + < br $> b $ b |

+ -E

| |

| + -


I need an effective way (time is my main concern here) to generate 10
000 000 unique alphanumeric strings of 16 characters each.

I used STL set and map but after about 5 000 000 entries, it becomes
very slow, even if I still have enough RAM available on my computer

Do you have any advice?

Here a code sample that I used to ensure uniqueness.

typedef pair<string, bool> StringBool_pair;
map<string, bool> MapValues;
pair<map<string,bool>::iterator,bool> pr;

for (int iii=0; iii<10000000; iii++) {
strCode = random16CharacterCode();
pr = MapValues.insert(StringBool_pair(strCode, true));
if(pr.second == true) {
// Accepted...
} else {
// Rejected...
}
}

Thank you

解决方案

Eric wrote:

I need an effective way (time is my main concern here) to generate 10
000 000 unique alphanumeric strings of 16 characters each.

I used STL set and map but after about 5 000 000 entries, it becomes
very slow, even if I still have enough RAM available on my computer

Do you have any advice?

Here a code sample that I used to ensure uniqueness.

typedef pair<string, bool> StringBool_pair;
map<string, bool> MapValues;
pair<map<string,bool>::iterator,bool> pr;

for (int iii=0; iii<10000000; iii++) {
strCode = random16CharacterCode();
pr = MapValues.insert(StringBool_pair(strCode, true));
if(pr.second == true) {
// Accepted...
} else {
// Rejected...
}
}

Thank you



See if your compiler has an implementation of hash_set. It is
nonstandard, but many compilers have it as an extension. For example,
in g++ you access it in the namespace __gnu_cxx:

#include <ext/hash_set>
#include <string>

int main()
{
__gnu_cxx::hash_set<std::string> h ;
// Do interesting things here.
return 0 ;
}
This should have the same interface as std::set, but uses a hash table
instead of a balanced binary tree (the usual implementation of
std::set), therefore giving you expected O(1) insertion and lookup,
instead of O(lg n).

Here is some good documentation for hash_set:
http://www.sgi.com/tech/stl/hash_set.html

-Alan


Eric wrote:

I need an effective way (time is my main concern here) to generate 10
000 000 unique alphanumeric strings of 16 characters each.

I used STL set and map but after about 5 000 000 entries, it becomes
very slow, even if I still have enough RAM available on my computer

Do you have any advice?



Can you be more specific about any other constraints on your strings?
If the sole requirement is that they be unique, then you can just list
the first 10^7 in alphabetical order, starting with aaa...aaa.

If they need to be random then you can do various sneaky things that
won''t affect the randomness too much (for most purposes). For example,
you can fix the first character of the string and generate 15 subsequent
random characters. Do this 10^7 / (number of distinct characters) times
for each distinct character. That will let you work with substantially
smaller data sets individually.

Other possibilities exist too, it just depends upon how much of your
system''s built-in randomness you''re willing to sacrifice.


Eric wrote:

I need an effective way (time is my main concern here) to generate 10
000 000 unique alphanumeric strings of 16 characters each.

I used STL set and map but after about 5 000 000 entries, it becomes
very slow, even if I still have enough RAM available on my computer

Do you have any advice?

Here a code sample that I used to ensure uniqueness.

typedef pair<string, bool> StringBool_pair;
map<string, bool> MapValues;
pair<map<string,bool>::iterator,bool> pr;

for (int iii=0; iii<10000000; iii++) {
strCode = random16CharacterCode();
pr = MapValues.insert(StringBool_pair(strCode, true));
if(pr.second == true) {
// Accepted...
} else {
// Rejected...
}
}

Thank you



In addition to the ideas others have mentioned (e.g. generating strings
deterministicaly using some one or another listing algorithm, or using a
random string generator that does not generate duplicates over a period
large enough for your purposes) you might want to try a trie. This is a
data structure for storing sets of strings. It has the following
properties: insertion of a new string takes time proportional to the length
of the string (and not the number of strings in the set); testing whether
or not a given string is already a member of the set takes time
proportional to the length of the string (and not the number of strings in
the set).

So what''s a "trie"? The name supposedly derives from "reTRIEval". Suppose
you have the set of strings {"HIS", "HERS", "HIM", "HE"} Then the trie
would look like (ascii art coming, hope you have a monospaced font):

+
|
+-H--+
|
+-E
| |
| +-


这篇关于使用STL映射并设置大量数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆