对于大型字典来说,架子太慢了,我该怎么做来提高性能? [英] Shelve is too slow for large dictionaries, what can I do to improve performance?

查看:156
本文介绍了对于大型字典来说,架子太慢了,我该怎么做来提高性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用python存储一个表,我需要持久化。



基本上我把表作为字典字符串存储到数字。

  self.DB = shelve.open(%s%sMoleculeLibrary.shelve% ,os.sep),writeback = True)



我使用 writeback True ,因为我发现如果不这样系统会不稳定。



在计算之后,系统需要关闭数据库,并将其存储回来。现在数据库(表)大约540MB,它正在花费的年龄。时间爆炸后表增长到大约500MB。但我需要一个更大的表。其实我需要两个。



我可能使用错误的持久化形式。 c:string

$ <$>

code>键值对,我建议使用JSON原生存储解决方案,例如 MongoDB 。它有一个奇妙的Python API, Pymongo 。 MongoDB本身是轻量级和令人难以置信的快,json对象本身将是Python中的字典。这意味着您可以使用 string 键作为对象ID,允许压缩存储和快速查找。



作为代码容易的示例,请参阅以下内容:

  d = {'string1':1,'string2 ':2,'string3':3} 
从pymongo import Connection
conn = Connection()
db = conn ['example-database']
collection = db [ example-collection']
for string,num in d.items():
collection.save({'_ id':string,'value':num})
#testing
newD = {}
for obj in collection.find():
newD [obj ['_ id']] = obj ['value']
print newD
#输出为:{u'string2':2,u'string3':3,u'string1':1}


$ b b

你只需要从unicode转换回来,这很简单。


I am storing a table using python and I need persistence.

Essentially I am storing the table as a dictionary string to numbers. And the whole is stored with shelve

self.DB=shelve.open("%s%sMoleculeLibrary.shelve"%(directory,os.sep),writeback=True) 

I use writeback to True as I found the system tends to be unstable if I don't.

After the computations the system needs to close the database, and store it back. Now the database (the table) is about 540MB, and it is taking ages. The time exploded after the table grew to about 500MB. But I need a much bigger table. In fact I need two of them.

I am probably using the wrong form of persistence. What can I do to improve performance?

解决方案

For storing a large dictionary of string : number key-value pairs, I'd suggest a JSON-native storage solution such as MongoDB. It has a wonderful API for Python, Pymongo. MongoDB itself is lightweight and incredibly fast, and json objects will natively be dictionaries in Python. This means that you can use your string key as the object ID, allowing for compressed storage and quick lookup.

As an example of how easy the code would be, see the following:

d = {'string1' : 1, 'string2' : 2, 'string3' : 3}
from pymongo import Connection
conn = Connection()
db = conn['example-database']
collection = db['example-collection']
for string, num in d.items():
    collection.save({'_id' : string, 'value' : num})
# testing
newD = {}
for obj in collection.find():
    newD[obj['_id']] = obj['value']
print newD
# output is: {u'string2': 2, u'string3': 3, u'string1': 1}

You'd just have to convert back from unicode, which is trivial.

这篇关于对于大型字典来说,架子太慢了,我该怎么做来提高性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆