每次重新加载python模块时如何避免计算 [英] How to avoid computation every time a python module is reloaded

查看:142
本文介绍了每次重新加载python模块时如何避免计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个使用巨大的字典全局变量的python模块,当前我将计算代码放在顶部,每次模块的第一次导入或重新加载都花费一分钟以上的时间,这是完全不可接受的.如何将计算结果保存在某处,以便下次导入/重新加载不必计算它?我尝试了cPickle,但从文件(1.3M)加载字典变量大约需要与计算相同的时间.

I have a python module that makes use of a huge dictionary global variable, currently I put the computation code in the top section, every first time import or reload of the module takes more then one minute which is totally unacceptable. How can I save the computation result somewhere so that the next import/reload doesn't have to compute it? I tried cPickle, but loading the dictionary variable from a file(1.3M) takes approximately the same time as computation.

要提供有关我的问题的更多信息,

To give more information about my problem,

FD = FreqDist(word for word in brown.words()) # this line of code takes 1 min

推荐答案

请澄清一下:每次导入模块时,模块主体中的代码不执行-仅运行一次,之后,将来的导入将找到已创建的模块,而不是重新创建它.查看sys.modules,以查看缓存的模块列表.

Just to clarify: the code in the body of a module is not executed every time the module is imported - it is run only once, after which future imports find the already created module, rather than recreating it. Take a look at sys.modules to see the list of cached modules.

但是,如果您的问题是程序运行后第一次导入所花费的时间,则可能需要使用除python dict以外的其他方法.最好是使用磁盘上的表单,例如sqlite数据库,这是dbm模块之一.

However, if your problem is the time it takes for the first import after the program is run, you'll probably need to use some other method than a python dict. Probably best would be to use an on-disk form, for instance a sqlite database, one of the dbm modules.

要在界面上进行最小的更改,最好使用shelve模块-在dbm模块之间放置一个非常透明的接口,使它们像任意python dict一样工作,允许存储任何可拾取的值.这是一个示例:

For a minimal change in your interface, the shelve module may be your best option - this puts a pretty transparent interface between the dbm modules that makes them act like an arbitrary python dict, allowing any picklable value to be stored. Here's an example:

# Create dict with a million items:
import shelve
d = shelve.open('path/to/my_persistant_dict')
d.update(('key%d' % x, x) for x in xrange(1000000))
d.close()

然后在下一步中使用它.应该没有大的延迟,因为只对磁盘上的表单上请求的键执行查找,因此不必将所有内容都加载到内存中.

Then in the next process, use it. There should be no large delay, as lookups are only performed for the key requested on the on-disk form, so everything doesn't have to get loaded into memory:

>>> d = shelve.open('path/to/my_persistant_dict')
>>> print d['key99999']
99999

比实际命令要慢一些,如果您执行需要所有键的操作(例如尝试打印它),它仍然需要很长的时间才能加载,但可以解决你的问题.

It's a bit slower than a real dict, and it will still take a long time to load if you do something that requires all the keys (eg. try to print it), but may solve your problem.

这篇关于每次重新加载python模块时如何避免计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆