快速将内存中的数据库转储到文件 [英] Quickly dumping a database in memory to file
问题描述
我想利用将 SQLite 数据库(通过 SQLAlchemy)保存在内存中的速度优势,同时进行一次性的内容插入过程,然后将其转储到文件中,存储以备后用.
I want to take advantage of the speed benefits of holding an SQLite database (via SQLAlchemy) in memory while I go through a one-time process of inserting content, and then dump it to file, stored to be used later.
考虑一个沼泽标准数据库以通常方式创建:
Consider a bog-standard database created in the usual way:
# in-memory database
e = create_engine('sqlite://')
除了创建一个全新的数据库并手动插入每个条目之外,是否有一种更快的方法将其内容移动到光盘上?
Is there a quicker way of moving its contents to disc, other than just creating a brand new database and inserting each entry manually?
对于我是否会看到使用内存数据库的任何好处存在一些疑问.不幸的是,我已经看到了大约 120 倍的巨大时差.
There is some doubt as to whether or not I'd even see any benefits to using an in-memory database. Unfortunately I already see a huge time difference of about 120x.
这种混乱可能是由于我遗漏了问题中的一些重要细节.也可能是由于我对缓存/页面大小等缺乏了解.请允许我详细说明:
This confusion is probably due to me missing out some important detail in the question. Also probably due to a lack of understanding on my part re: caches / page sizes / etc. Allow me to elaborate:
我正在运行我设置的系统的模拟,每个模拟都经过以下阶段:
I am running simulations of a system I have set up, with each simulation going through the following stages:
- 对数据库进行一些查询.
- 根据这些查询的结果进行计算/运行模拟.
insert
根据最近的模拟将新条目插入数据库.- 通过运行
commit()
.
- Make some queries to the database.
- Make calculations / run a simulation based on the results of those queries.
insert
new entries into the database based on the most recent simulation.- Make sure the database is up to date with the new entries by running
commit()
.
虽然我只在每次模拟运行中插入十几个左右,但我确实运行了数百万次模拟,并且每次模拟的结果需要可用以便将来进行模拟.正如我所说,在运行文件支持的数据库时,这个读取和写入过程需要更长的时间;这是 6 小时和一个月的区别.
While I only ever make a dozen or so insertions on each simulation run, I do however run millions of simulations, and the results of each simulation need to be available for future simulations to take place. As I say, this read and write process takes considerably longer when running a file-backed database; it's the difference between 6 hours and a month.
希望这能澄清事情.如有必要,我可以拼凑一个简单的 Python 脚本来进一步概述我的流程.
Hopefully this clarifies things. I can cobble together a simple python script to outline my process further a little further if necessary.
推荐答案
SQLAlchemy 和 SQLite 知道如何缓存和批量插入就好了.
SQLAlchemy and SQLite know how to cache and do batch-inserts just fine.
在这里使用内存中的 SQLite 数据库没有任何好处,因为该数据库就像磁盘版本一样使用页面,唯一的区别是最终这些页面会被写入磁盘以供基于磁盘的数据库使用.性能差异只有1.5倍,见SQLite 性能基准测试 -- 为什么 :memory: 这么慢...只有磁盘速度的 1.5 倍?
There is no benefit in using an in-memory SQLite database here, because that database uses pages just like the on-disk version would, and the only difference is that eventually those pages get written to disk for disk-based database. The difference in performance is only 1.5 times, see SQLite Performance Benchmark -- why is :memory: so slow...only 1.5X as fast as disk?
以后也没有办法将内存数据库移动到基于磁盘的数据库,除非在内存数据库上运行查询并在两个单独的连接上执行批量插入到基于磁盘的数据库中.
There is also no way to move the in-memory database to a disk-based database at a later time, short of running queries on the in-memory database and executing batch inserts into the disk-based database on two separate connections.
这篇关于快速将内存中的数据库转储到文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!