存储/检索数据结构 [英] Store/retrieve a data structure
问题描述
我已经在Python中实现了一个后缀树来进行全文搜索,并且运行正常。但是有一个问题:索引的文本可能非常大,所以我们将无法在RAM中使用整个结构。
I have implemented a suffix tree in Python to make full-text-searchs, and it's working really well. But there's a problem: the indexed text can be very big, so we won't be able to have the whole structure in RAM.
IMAGE:后缀树单词 BANANAS
(在我的情况下,想象一棵树的100000倍大)。
IMAGE: Suffix tree for the word BANANAS
(in my scenario, imagine a tree 100000 times bigger).
所以,研究一点关于它,我发现 pickle
模块,一个伟大的Python模块,用于从/向文件加载和转储对象,并猜测什么?我的数据结构很好用。
So, researching a little bit about it I found the pickle
module, a great Python module for "loading" and "dumping" objects from/into files, and guess what? It works wonderfully with my data structure.
所以,把长的故事缩短:在磁盘上存储和检索这个结构的最好的策略是什么?我的意思是,一个解决方案可能是将每个节点存储在一个文件中,并在需要的时候将其从磁盘加载,但这不是最好的想法(太多的磁盘访问)。
So, making the long story shorter: What would be the best strategy to store and retrieve this structure on/from disk? I mean, a solution could be to store each node in a file and load it from disk whenever is needed, but this isn't the best think to do (too many disk accesses).
脚注:虽然我把这个问题标记为 python ,编程语言不是问题的重要部分,磁盘存储/检索策略是真正的要点。 p>
Footnote: Although I have tagged this question as python, the programming language isn't the important part of the question, the disk storing/retrieving strategy is really the main point.
推荐答案
如果 pickle
已经在为您服务,您可能需要看看 ZODB ,它在 pickle之上添加了一些功能
。看看文档,我看到这个段落看起来是为了解决你遇到的大小问题:
If pickle
is already working for you, you may want to take a look at ZODB which adds some functionality on top of pickle
. Looking at the documentation, I saw this paragraph that looks to address the size concerns you're having:
数据库在内存之间自由移动对象和存储。如果一个
对象在一段时间内没有被使用,它可能会被释放,并且它的
内容在下次使用时从存储装载。
The database moves objects freely between memory and storage. If an object has not been used in a while, it may be released and its contents loaded from storage the next time it is used.
这篇关于存储/检索数据结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!