在shelve和sqlite之间选择真正的大字典(Python) [英] Selecting between shelve and sqlite for really large dictionary (Python)

查看:341
本文介绍了在shelve和sqlite之间选择真正的大字典(Python)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大的Python字典的向量(150k向量,每个10k维)每个浮点数不能加载到内存中,所以我必须使用两种方法之一来存储在磁盘上并检索特定的向量适当时候这些向量将被创建并存储一次,但是可能会被读取很多(上千次),所以读取效率非常重要。经过一些测试与搁置模块,我倾向于认为sqlite将是一个更好的选择这种任务,但在我开始编写代码之前,我想听到更多的意见...例如,有没有其他现在,假设我们同意最好的选择是sqlite,另一个问题与表的确切形式有关,而不是那些我不知道的那些选项?



。我正在考虑使用格式为 vector_key,element_no,value 的细粒度结构来帮助有效的分页,而不是将所有10k个矢量的元素存储到同样的记录。我真的很感谢任何关于这个问题的建议。

解决方案

你想要sqlite3,那么如果你使用ORM,如 sqlalchemy ,那么您可以轻松地扩展并使用其他后端数据库。



另外您正在谈论的一点叫做归一化,我个人从来没有很好在其中应该为您解释。



正如额外的注释一样,显示搁置vs sqlite3中的性能失败


I have a large Python dictionary of vectors (150k vectors, 10k dimensions each) of float numbers that can't be loaded into memory, so I have to use one of the two methods for storing this on disk and retrieving specific vectors when appropriate. The vectors will be created and stored once, but might be read many (thousands of) times -- so it is really important to have efficient reading. After some tests with shelve module, I tend to believe that sqlite will be a better option for this kind of task, but before I start writing code I would like to hear some more opinions on this... For example, are there any other options except of those two that I'm not aware of?

Now, assuming we agree that the best option is sqlite, another question relates to the exact form of the table. I'm thinking of using a fine-grained structure with rows of the form vector_key, element_no, value to help efficient pagination, instead of storing all 10k elements of a vector into the same record. I would really appreciate any suggestions on this issue.

解决方案

You want sqlite3, then if you use an ORM like sqlalchemy then you can easily grow to expand and use other back end databases.

Shelve is more of a "toy" than actually useful in production code.

The other point you are talking about is called normalization and I have personally never been very good at it this should explain it for you.

Just as an extra note this shows performance failures in shelve vs sqlite3

这篇关于在shelve和sqlite之间选择真正的大字典(Python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆