(python-)sqlite3:防止 COMMIT 重置无关临时表上的读取游标 [英] (python-)sqlite3: prevent COMMIT from resetting read cursor on unrelated temporary table

查看:29
本文介绍了(python-)sqlite3:防止 COMMIT 重置无关临时表上的读取游标的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题末尾的(令人遗憾的是冗长的)MWE 是从一个真实的应用程序中剪下来的.应该这样工作:有两个表.一个包含已处理和未处理的数据,另一个包含处理数据的结果.在启动时,我们创建一个临时表,列出所有尚未处理的数据.然后我们在那个表上打开一个读游标并从头到尾扫描它;对于每个数据,我们进行一些处理(在 MWE 中省略),然后使用单独的游标将结果插入到已处理数据表中.

The (regrettably lengthy) MWE at the end of this question is cut down from a real application. It is supposed to work like this: There are two tables. One includes both already-processed and not-yet-processed data, the other has the results of processing the data. On startup, we create a temporary table that lists all of the data that has not yet been processed. We then open a read cursor on that table and scan it from beginning to end; for each datum, we do some crunching (omitted in the MWE) and then insert the results into the processed-data table, using a separate cursor.

这在自动提交模式下正常工作.但是,如果写入操作包含在事务中——在实际应用程序中,它必须如此,因为写入实际上涉及多个表(除了其中一个已从 MWE 中省略)——然后 COMMIT 操作具有重置临时表上的读取游标的副作用,导致已处理的行被重新处理,这不仅会阻止前进,还会导致程序崩溃并出现 IntegrityError试图在 data_out 中插入重复的行.如果您运行 MWE,您应该会看到以下输出:

This works correctly in autocommit mode. However, if the write operation is wrapped in a transaction -- and in the real application, it has to be, because the write actually touches several tables (all but one of which have been omitted from the MWE) -- then the COMMIT operation has the side-effect of resetting the read cursor on the temp table, causing rows that have already been processed to be reprocessed, which not only prevents forward progress, it causes the program to crash with an IntegrityError upon trying to insert a duplicate row into data_out. If you run the MWE you should see this output:

0
1
2
3
4
5
6
7
8
9
10
0
---
127 rows remaining
Traceback (most recent call last):
  File "sqlite-test.py", line 85, in <module>
    test_main()
  File "sqlite-test.py", line 83, in test_main
    test_run(db)
  File "sqlite-test.py", line 71, in test_run
    (row[0], b"output"))
sqlite3.IntegrityError: UNIQUE constraint failed: data_out.value

我可以做些什么来防止读取光标被接触无关表的 COMMIT 重置?

What can I do to prevent the read cursor from being reset by a COMMIT touching unrelated tables?

注意:架构中的所有整数都是 ID 号;在实际应用中,还有几个辅助表为每个 ID 保存了更多信息,写入事务除了 data_out 之外还涉及其中的两个或三个,具体取决于计算结果.在实际应用中,临时data_todo"表可能非常大——数百万行;我开始走这条路正是因为 Python 列表太大而无法放入内存.MWE 的 shebang 适用于 python3,但它在 python2 下的行为完全相同(假设解释器足够新以理解 b"..." 字符串).设置 PRAGMA locked_mode = EXCLUSIVE; 和/或 PRAGMA journal_mode = WAL; 对该现象没有影响.我使用的是 SQLite 3.8.2.

Notes: All of the INTEGERs in the schema are ID numbers; in the real application there are several more ancillary tables that hold more information for each ID, and the write transaction touches two or three of them in addition to data_out, depending on the result of the computation. In the real application, the temporary "data_todo" table is potentially very large -- millions of rows; I started down this road precisely because a Python list was too big to fit in memory. The MWE's shebang is for python3 but it will behave exactly the same under python2 (provided the interpreter is new enough to understand b"..." strings). Setting PRAGMA locking_mode = EXCLUSIVE; and/or PRAGMA journal_mode = WAL; has no effect on the phenomenon. I am using SQLite 3.8.2.

#! /usr/bin/python3

import contextlib
import sqlite3
import sys
import tempfile
import textwrap

def init_db(db):
    db.executescript(textwrap.dedent("""\
        CREATE TABLE data_in (
            origin    INTEGER,
            origin_id INTEGER,
            value     INTEGER,
            UNIQUE(origin, origin_id)
        );
        CREATE TABLE data_out (
            value     INTEGER PRIMARY KEY,
            processed BLOB
        );
        """))

    db.executemany("INSERT INTO data_in VALUES(?, ?, ?);",
                   [ (1, x, x) for x in range(100) ])
    db.executemany("INSERT INTO data_in VALUES(?, ?, ?);",
                   [ (2, x, 200 - x*2) for x in range(100) ])

    db.executemany("INSERT INTO data_out VALUES(?, ?);",
                   [ (x, b"already done") for x in range(50, 130, 5) ])

    db.execute(textwrap.dedent("""\
        CREATE TEMPORARY TABLE data_todo AS
            SELECT DISTINCT value FROM data_in
            WHERE value NOT IN (SELECT value FROM data_out)
            ORDER BY value;
        """))

def test_run(db):
    init_db(db)

    read_cur  = db.cursor()
    write_cur = db.cursor()

    read_cur.arraysize = 10
    read_cur.execute("SELECT * FROM data_todo;")

    try:
        while True:
            block = read_cur.fetchmany()
            if not block: break
            for row in block:
                # (in real life, data actually crunched here)
                sys.stdout.write("{}\n".format(row[0]))
                write_cur.execute("BEGIN TRANSACTION;")
                # (in real life, several more inserts here)
                write_cur.execute("INSERT INTO data_out VALUES(?, ?);",
                                  (row[0], b"output"))
                db.commit()

    finally:
        read_cur.execute("SELECT COUNT(DISTINCT value) FROM data_in "
                         "WHERE value NOT IN (SELECT value FROM data_out)")
        result = read_cur.fetchone()
        sys.stderr.write("---\n{} rows remaining\n".format(result[0]))

def test_main():
    with tempfile.NamedTemporaryFile(suffix=".db") as tmp:
        with contextlib.closing(sqlite3.connect(tmp.name)) as db:
            test_run(db)

test_main()

推荐答案

为临时表使用第二个单独的连接,它不会受到另一个连接上提交的影响.

Use a second, separate connection for the temporary table, it'll be unaffected by commits on the other connection.

这篇关于(python-)sqlite3:防止 COMMIT 重置无关临时表上的读取游标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆