非常大的字典 [英] very large dictionary
问题描述
你好,
我试图在拥有128G内存
的服务器上加载6.8G大字典。我收到了内存错误。我使用Python 2.5.2。如何加载我的
数据?
SImon
Hello,
I tried to load a 6.8G large dictionary on a server that has 128G of
memory. I got a memory error. I used Python 2.5.2. How can I load my
data?
SImon
推荐答案
On 2008年8月1日星期五00:46:09 -0700,Simon Strobl写道:
On Fri, 01 Aug 2008 00:46:09 -0700, Simon Strobl wrote:
我试图在128G的服务器上加载一个6.8G的大字典
内存。我收到了内存错误。我使用Python 2.5.2。如何加载我的
数据?
I tried to load a 6.8G large dictionary on a server that has 128G of
memory. I got a memory error. I used Python 2.5.2. How can I load my
data?
什么是加载字典意思?它是用`pickle`
模块保存的吗?
使用数据库而不是字典怎么样?
Ciao,
Marc''BlackJack''Rintsch
What does "load a dictionary" mean? Was it saved with the `pickle`
module?
How about using a database instead of a dictionary?
Ciao,
Marc ''BlackJack'' Rintsch
什么是加载词典是什么意思?
我有一个文件bigrams.py,内容如下:
bigrams = {
,djy :75,
",djz" :57,
",djzoom" :165,
",dk" :28893,
",dk.au" :854,
",dk.b。" :3668,
....
}
在另一个文件中我说:
来自bigrams import bigrams
What does "load a dictionary" mean?
I had a file bigrams.py with a content like below:
bigrams = {
", djy" : 75 ,
", djz" : 57 ,
", djzoom" : 165 ,
", dk" : 28893 ,
", dk.au" : 854 ,
", dk.b." : 3668 ,
....
}
In another file I said:
from bigrams import bigrams
如何使用数据库而不是字典?
How about using a database instead of a dictionary?
如果没有其他办法,我将不得不学习如何在Python中使用
数据库。不过,我希望能够使用相同类型的
脚本以及各种大小的数据。
If there is no other way to do it, I will have to learn how to use
databases in Python. I would prefer to be able to use the same type of
scripts with data of all sizes, though.
Simon Strobl:
Simon Strobl:
我有一个文件bigrams.py,内容如下:
bigrams = {
",djy" :75,
",djz" :57,
",djzoom" :165,
",dk" :28893,
",dk.au" :854,
",dk.b。" :3668,
...
}
在另一个文件中我说:
来自bigrams import bigrams
I had a file bigrams.py with a content like below:
bigrams = {
", djy" : 75 ,
", djz" : 57 ,
", djzoom" : 165 ,
", dk" : 28893 ,
", dk.au" : 854 ,
", dk.b." : 3668 ,
...
}
In another file I said:
from bigrams import bigrams
这里的模块大小可能有限制。您可以尝试在磁盘上更改数据格式,创建如下文本文件:
",djy" 75
",djz" 57
",djzoom" 165
....
然后在一个模块中你可以创建一个空的字典,用以下内容读取
数据的行:
for somefile中的行:
part,n = .rsplit("",1)
somedict [part.strip(''' ;'')] = int(n)
否则你可能需要使用BigTable,DB等。
Probably there''s a limit in the module size here. You can try to
change your data format on disk, creating a text file like this:
", djy" 75
", djz" 57
", djzoom" 165
....
Then in a module you can create an empty dict, read the lines of the
data with:
for line in somefile:
part, n = .rsplit(" ", 1)
somedict[part.strip(''"'')] = int(n)
Otherwise you may have to use a BigTable, a DB, etc.
如果没有其他办法,我将不得不学习如何在Python中使用
数据库。不过,我希望能够使用相同类型的
脚本和所有大小的数据。
If there is no other way to do it, I will have to learn how to use
databases in Python. I would prefer to be able to use the same type of
scripts with data of all sizes, though.
我明白,我不知道是否有64位Python的
dicts的文件限制。 />
再见,
熊宝宝
I understand, I don''t know if there are documented limits for the
dicts of the 64-bit Python.
Bye,
bearophile
这篇关于非常大的字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!