Python 在 fetchone 上运行缓慢,在 fetchall 上挂起 [英] Python slow on fetchone, hangs on fetchall
问题描述
我正在编写一个脚本来 SELECT 查询数据库并解析大约 33,000 条记录.不幸的是,我在 cursor.fetchone()
/cursor.fetchall()
阶段遇到了问题.
I'm writing a script to SELECT query a database and parse through ~33,000 records. Unfortunately I'm running into problems at the cursor.fetchone()
/cursor.fetchall()
phase of things.
我首先尝试像这样一次遍历游标一条记录:
I first tried iterating through the cursor a record at a time like so:
# Run through every record, extract the kanji, then query for FK and weight
printStatus("Starting weight calculations")
while True:
# Get the next row in the cursor
row = cursor.fetchone()
if row == None:
break
# TODO: Determine if there's any kanji in row[2]
weight = float((row[3] + row[4]))/2
printStatus("Weight: " + str(weight))
基于 printStatus
的输出(它打印出时间戳加上传递给它的任何字符串),脚本大约需要 1 秒来处理每一行.这让我相信每次循环迭代时都会重新运行查询(使用 LIMIT 1 或其他东西),因为在 SQLiteStudio [i] 和 [/i] 返回所有 33,000 行.我计算出,按照这个速度,浏览所有 33,000 条记录大约需要 7 个小时.
Based on the output of printStatus
(it prints out a timestamp plus whatever string is passed to it), the script took approximately 1 second to process each row. This lead me to believe that the query was being re-run each time the loop iterated (with a LIMIT 1 or something), as it took ~1 second for the same query to run once in something like SQLiteStudio [i]and[/i] return all 33,000 rows. I calculated that, at that rate, it would take around 7 hours to get through all 33,000 records.
我没有坐下来,而是尝试使用 cursor.fetchall() 来代替:
Instead of sitting through that, I tried to use cursor.fetchall() instead:
results = cursor.fetchall()
# Run through every record, extract the kanji, then query for FK and weight
printStatus("Starting weight calculations")
for row in results:
# TODO: Determine if there's any kanji in row[2]
weight = float((row[3] + row[4]))/2
printStatus("Weight: " + str(weight))
不幸的是,Python 可执行文件在到达 cursor.fetchall()
行时锁定在 25% 的 CPU 和 ~6MB 的 RAM.我让脚本运行了大约 10 分钟,但什么也没发生.
Unfortunately, the Python executable locked up at 25% CPU and ~6MB of RAM when it got to the cursor.fetchall()
line. I left the script running for ~10 minutes, but nothing happened.
返回的约 33,000 行(大约 5MB 的数据)对于 Python 一次抓取是否太多?我是不是一次只能迭代一个?或者我可以做些什么来加快速度?
Is ~33,000 returned rows (about 5MB of data) too much for Python to grab at once? Am I stuck iterating through one at a time? Or is there something I can do to speed things up?
这是一些控制台输出
12:56:26.019: Adding new column 'weight' and related index to r_ele
12:56:26.019: Querying database
12:56:28.079: Starting weight calculations
12:56:28.079: Weight: 1.0
12:56:28.079: Weight: 0.5
12:56:28.080: Weight: 0.5
12:56:28.338: Weight: 1.0
12:56:28.339: Weight: 3.0
12:56:28.843: Weight: 1.5
12:56:28.844: Weight: 1.0
12:56:28.844: Weight: 0.5
12:56:28.844: Weight: 0.5
12:56:28.845: Weight: 0.5
12:56:29.351: Weight: 0.5
12:56:29.855: Weight: 0.5
12:56:29.856: Weight: 1.0
12:56:30.371: Weight: 0.5
12:56:30.885: Weight: 0.5
12:56:31.146: Weight: 0.5
12:56:31.650: Weight: 1.0
12:56:32.432: Weight: 0.5
12:56:32.951: Weight: 0.5
12:56:32.951: Weight: 0.5
12:56:32.952: Weight: 1.0
12:56:33.454: Weight: 0.5
12:56:33.455: Weight: 0.5
12:56:33.455: Weight: 1.0
12:56:33.716: Weight: 0.5
12:56:33.716: Weight: 1.0
这是 SQL 查询:
//...snip (it wasn't the culprit)...
SQLiteStudio 的 EXPLAIN QUERY PLAN 的输出:
The output of EXPLAIN QUERY PLAN from SQLiteStudio:
0 0 0 SCAN TABLE r_ele AS re USING COVERING INDEX r_ele_fk (~500000 rows)
0 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 1
1 0 0 SEARCH TABLE re_pri USING INDEX re_pri_fk (fk=?) (~10 rows)
0 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 2
2 0 0 SEARCH TABLE ke_pri USING INDEX ke_pri_fk (fk=?) (~10 rows)
2 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 3
3 0 0 SEARCH TABLE k_ele USING AUTOMATIC COVERING INDEX (value=?) (~7 rows)
3 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 4
4 0 0 SEARCH TABLE k_ele USING COVERING INDEX idx_k_ele (fk=?) (~10 rows)
0 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 5
5 0 0 SEARCH TABLE k_ele USING COVERING INDEX idx_k_ele (fk=?) (~10 rows)
0 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 6
6 0 0 SEARCH TABLE re_pri USING INDEX re_pri_fk (fk=?) (~10 rows)
0 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 7
7 0 0 SEARCH TABLE ke_pri USING INDEX ke_pri_fk (fk=?) (~10 rows)
7 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 8
8 0 0 SEARCH TABLE k_ele USING AUTOMATIC COVERING INDEX (value=?) (~7 rows)
8 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 9
9 0 0 SEARCH TABLE k_ele USING COVERING INDEX idx_k_ele (fk=?) (~10 rows)
推荐答案
SQLite 即时计算结果记录.fetchone
很慢,因为它必须为 r_ele
中的每条记录执行所有子查询.fetchall
甚至更慢,因为它花费的时间与您对所有记录执行 fetchone
的时间一样长.
SQLite computes result records on the fly.
fetchone
is slow because it has to execute all subqueries for each record in r_ele
.
fetchall
is even slower because it takes just as long as if you had executed fetchone
for all records.
SQLite 3.7.13 估计 value
列上的所有查找都会非常慢,因此为此查询创建了一个临时索引.您应该创建一个永久索引,以便 SQLite 3.6.21 可以使用它:
SQLite 3.7.13 estimates that all the lookups on the value
column would be horribly slow, and therefore creates a temporary index for this query.
You should create a permanent index so that it can be used by SQLite 3.6.21:
CREATE INDEX idx_k_ele_value ON k_ele(value);
如果这没有帮助,请更新到具有较新 SQLite 版本的 Python,或使用其他内置较新 SQLite 版本的数据库库,例如 APSW.
If that does not help, update to a Python with a newer SQLite version, or use another database library with a newer SQLite version built-in, such as APSW.
这篇关于Python 在 fetchone 上运行缓慢,在 fetchall 上挂起的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!