批量更新大型sqlite数据库 [英] Update large sqlite database in chunks
问题描述
我有一个sqlite数据库(约11 GB),其中包含多个表,包括表distance
和vertices
.表distance
很大(120 mio行),表vertices
很小(1.5万行).我想在python中使用sqlite3通过vertices
中另一列的值更新distance
中的一列.表顶点在列cat
上有一个索引,在orig_cat
上有另一个索引.
I have a sqlite database (appr. 11 GB) that has multiple tables including the tables distance
and vertices
. The table distance
is pretty large (120 mio rows), vertices
is smaller (15 000 rows). I want to use sqlite3 in python to update one column of distance
by values of another column in vertices
. The table vertices has an index on column cat
and another index on orig_cat
.
我在做什么:
import sqlite3
db_path='path/to/db.db'
conn = sqlite3.connect(db_path)
cur = conn.cursor()
cur.execute('''UPDATE distance SET
from_orig_v = (SELECT orig_cat FROM vertices WHERE cat=distance.source)''')
但是,在如此大的数据库上运行该update语句会导致内存错误.内存使用率一直稳定增长,直到崩溃为止.我是否在寻找建议以执行如此大的更新语句而又不会耗尽内存?也许以块(即distance
表的行)处理更新并在例如可用内存有1000个更新?如何在python/sqlite中完成?
However running that update statement on such a large database, causes a memory error. The memory usage is increasing steadily until it crashes. I am looking for advise to perform such a large update statement without running out of memory? Maybe processing the update in chunks (i.e. rows of distance
table) and committing after e.g. 1000 updates to free memory? How would that be done in python/sqlite?
推荐答案
应该可以使用如下语句更新块:
It should be possible to update chunks with statements like this:
UPDATE distance SET ... WHERE rowid BETWEEN 100000 AND 200000;
您不需要使用多个交易;实际上必须保留在内存中的唯一一件事就是在单个语句中要更新的行的列表. (从理论上讲,当内存用尽时,您应该收到一条适当的错误消息.实际上,某些操作系统过量使用内存,直到为时已晚才告诉应用程序.)
You don't need to use multiple transactions; the only thing that actually must be kept in memory is the list of rows to be updated in a single statement. (In theory, when memory runs out, you should get an appropriate error message. In practice, some OSes overcommit memory and don't tell the application about it until it's too late.)
这篇关于批量更新大型sqlite数据库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!