SQLite的:有效的方式放弃大量的行 [英] SQLite: efficient way to drop lots of rows
问题描述
SQLite的,Android的,真实的故事。我有一个表,这是我作为缓存使用:
SQlite, Android, true story. I have a table, which I use as a cache:
CREATE TABLE cache(key TEXT, ts TIMESTAMP, size INTEGER, data BLOB);
CREATE UNIQUE INDEX by_key ON cache(key);
CREATE INDEX by_ts ON cache(ts);
在应用程序有生之年,我填充缓存,在某些时候我想清楚它拖放 N
记录。通常,该表将包含〜25000斑点〜在DB 100-500Kb每次,总斑点大小是600-800Mb,但现在我测试〜2000年这是约60MB(下面的数字是这种情况)。清除去除90%的高速缓存条目。
During app lifetime I fill the cache and at some point I want to clear it out and drop N
records. Typically this table will contain ~25000 blobs ~100-500Kb each, total blobs size in the DB is 600-800Mb, but now I test for ~2000 which are about 60Mb (following numbers are for this case). Clear removes 90% of cache entries.
我尝试不同的方式来做到这一点,在这里简要说明:
[1] 最差,最简单的。首先选择,比删除一个接一个地走光标。非常慢。
[1] Worst and simplest. First select, than remove one by one, walking cursor. Terribly slow.
[2] 请SQLite的与查询做到这一点(删除斑点与他们完全 N
字节):
[2] Make SQLite to do it with query (delete blobs with totally N
bytes in them):
DELETE FROM blobs WHERE
ROWID IN (SELECT ROWID FROM blobs WHERE
(SELECT SUM(size) FROM blobs AS _ WHERE ts <= blobs.ts) <= N);
这是快,但仍然非常缓慢:〜15秒。似乎也是它喜欢它有二次的复杂性。
This is faster, but still terribly slow: ~15 sec. Seems also it like it has quadratic complexity.
[3] 在周围为删除(使用平均BLOB大小计算),并用简单的,其中
条款删除选择行:
[3] Select row around where to remove (using average blob size for computations) and delete with simple WHERE
clause:
-- Find row after which to delete, let it's time stamp is T0:
SELECT ts FROM cache ORDER BY ts LIMIT 1 OFFSET count;
-- Delete
DELETE FROM cache WHERE ts < T0;
这是好多了,但需要〜7秒。
This is much better, but takes ~7 sec.
[4] 创建新表,复制我需要保存和删除旧的。请注意,我在新表中创建索引后我抄这些东西:
[4] Create new table, copy that I need to save and drop old one. Note, that I create index in the new table AFTER I copied all this stuff:
-- Insert only rows I want leave
INSERT INTO temp(key, ts, size, data) SELECT key, ts, size, data
FROM cache ORDER BY ts LIMIT count;
-- Drop table and indices.
DROP INDEX by_key;
DROP INDEX by_ts;
DROP TABLE cache;
-- Rename temp table and create indices...
复制大约需要300毫秒为6MB的的斑点。但 DROP TABLE
约为〜8秒。
请注意在所有的情况下,我做的真空
这需要另一个〜1秒。我怎样才能使它快?为什么 DROP TABLE
和删除都这么慢?我想可能是因为指标:当我做出退学关键指标前删除
它的工作速度更快。如何让SQLite的删除快捷?
Note in all cases I do VACUUM
which takes another ~1 sec. How can I make it fast? Why DROP TABLE
and deletion are so slow? I think it might be because of indices: when I dropped key index before DELETE
it worked faster. How to make SQLite delete fast?
推荐答案
您正在以大数据的数据库 - 也就是说,每一个斑点使用多页
You are working on a database with "big" data - ie, each blob using multiple pages.
目前接近最优的性能有一些时候你会达到一个极限,你能不能改善。
At some point near optimal performance you will reach a limit you can't improve.
检查所有的选择,我看到不同的行为,而不仅仅是不同的算法。
Checking all your choices, I see different behaviors, not just different algorithms.
[1]这不应该是可怕的慢,只要你使用事务。你需要两个操作一次,查询(获得BLOB大小),并删除。
[1] This one shouldn't be terrible slow as long as you use a transaction. You need two operations at once, query (to get blob size) and delete.
[2]这是一个不错的办法。作为两个查询和删除,都在一个单一的命令,所以SQLite的发动机优化。
[2] This is a good approach. As two queries and a delete, all in a single command, so SQLite engine will optimize.
[3]这是从之前的所有不同的行为。同 DELETE FROM缓存为Ts&LT; (TS选择从缓存中的ORDER BY TS LIMIT 1 OFFSET计)
。查询更便宜那么previous,但我敢打赌,删除的行数都远远低于previous之一!查询昂贵的部分/删除将删除!查询优化是重要的,但事情总是会得到在删除慢。
[3] This is a different behaviour from all before. Same as DELETE FROM cache WHERE ts < (SELECT ts FROM cache ORDER BY ts LIMIT 1 OFFSET count)
. Query is less expensive then previous, but I bet number of rows deleted are far less then previous one! Expensive part of query/delete will be delete! Query optimization is important, but things will always get slower in delete.
[4]这是一个非常糟糕的做法!复制所有数据到新表 - 也许另一个数据库 - 将会非常昂贵。我只得到一个优势,从这样的:你可以将数据复制到一个新的数据库,避免真空
,为新的数据库是构建基础,它的清洁
[4] This is a very bad approach!!! Copying all your data to a new table - maybe another database - will be VERY expensive. I only get one advantage from this: you may copy data to a new database and avoid VACUUM
, as new database was build from base and it's clean.
关于真空
...最差的则删除
是真空
。真空是不应该被经常使用在数据库中。我理解这个算法应该是干净的数据库,但清洁不应该是一个频繁操作 - 数据库的选择/插入优化/删除/更新 - 而不是将所有的数据在最小尺寸
About VACUUM
... Worst then DELETE
is VACUUM
. Vacuum is not supposed to be used often in a database. I understand this algorithm is supposed to "clean" your database, but cleaning shouldn't be a frequent operation - databases are optimized for select/insert/delete/update - not to keep all data at a minimal size.
我的选择是使用删除... IN(SELECT ...)
单人操作,根据predefined标准。 真空
将不会被使用,至少不那么频繁。一个不错的选择将显示器分贝的大小 - 当这种规模碾过的限制,运行假设昂贵清洗,修剪数据库
My choice would be using a DELETE ... IN (SELECT ...)
single operation, according to predefined criteria. VACUUM
wouldn't be used, at least not so often. One good choice would be monitor db size - when this size run over a limit, run a assumed expensive cleaning to trim database.
最后,使用多个命令时,不要忘记用交易!
At last, when using multiple commands, never forget to use transactions!
这篇关于SQLite的:有效的方式放弃大量的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!