peewee.OperationalError:只有 150 行 * 8 列的 upsert 上的 SQL 变量太多 [英] peewee.OperationalError: too many SQL variables on upsert of only 150 rows * 8 columns

查看:54
本文介绍了peewee.OperationalError:只有 150 行 * 8 列的 upsert 上的 SQL 变量太多的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在下面的例子中,在我的机器上,设置 range(150) 会导致错误,而 range(100) 不会:

from peewee import *数据库 = SqliteDatabase(无)类基础(模型):元类:数据库 = 数据库colnames = [A",B",C",D",E",F",G",H"]cols = {x: TextField() for x in colnames}table = type('mytable', (Base,), cols)database.init('test.db')database.create_tables([表])数据 = []对于范围内的 x(150):data.append({x: 1 for x in colnames})使用 database.atomic() 作为 txn:table.insert_many(data).upsert().execute()

导致:

回溯(最近一次调用最后一次):文件<stdin>",第 2 行,在 <module> 中文件/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py",第3213行,在执行中游标 = self._execute()文件/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py",第 2628 行,在 _execute 中返回 self.database.execute_sql(sql, params, self.require_commit)文件/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py",第3461行,在execute_sql中自我提交()文件/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py",第 3285 行,在 __exit__重新加注(新类型,新类型(*exc_args),回溯)文件/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py",第3454行,在execute_sql中cursor.execute(sql, params or ())peewee.OperationalError:SQL 变量过多

这对我来说似乎很低.我正在尝试使用 peewee 来替换现有的基于 pandas 的 SQL 构造,因为 pandas 缺乏对主键的支持.每个循环只能插入约 100 条记录是非常低的,而且如果有一天列数增加,则很脆弱.

我怎样才能使这项工作更好?可能吗?

解决方案

经过一番调查,问题似乎与 最大参数数量,一个 sql 查询可以有:SQLITE_MAX_VARIABLE_NUMBER.

为了能够进行大批量插入,我首先估计 SQLITE_MAX_VARIABLE_NUMBER,然后使用它在要插入的字典列表中创建块.

为了估计我使用这个函数的价值受到这个答案的启发:

<预><代码>def max_sql_variables():"""获取当前查询允许的最大参数数sqlite3 实现.基于`这个问题`_退货-------整数推断的 SQLITE_MAX_VARIABLE_NUMBER"""导入 sqlite3db = sqlite3.connect(':memory:')cur = db.cursor()cur.execute('CREATE TABLE t (test)')低,高 = 0, 100000而(高 - 1)> 低:猜测 = (高 + 低)//2query = 'INSERT INTO t VALUES ' + ','.join(['(?)' for _ in范围(猜测)])args = [str(i) for i in range(guess)]尝试:cur.execute(查询,参数)除了 sqlite3.OperationalError 作为 e:如果 str(e) 中的SQL 变量过多":高 = 猜测别的:增加别的:低 = 猜测cur.close()db.close()回报低SQLITE_MAX_VARIABLE_NUMBER = max_sql_variables()

然后我用上面的变量对data

进行切片<预><代码>使用 database.atomic() 作为 txn:大小 = (SQLITE_MAX_VARIABLE_NUMBER//len(data[0])) -1# 删除一个以避免在 peewee 添加一些变量时出现问题对于范围内的 i (0, len(data), size):table.insert_many(data[i:i+size]).upsert().execute()

<小时>

关于max_sql_variables 执行速度的更新.

在具有 4 个内核和 4 Gb RAM、运行 OpenSUSE Tumbleweed、SQLITE_MAX_VARIABLE_NUMBER 设置为 999 的 3 年前 Intel 机器上,该函数的运行时间不到 100 毫秒.如果我设置 high = 1000000,则执行时间大约为 300ms.

在具有 8 个内核和 8Gb RAM、运行 Kubuntu、SQLITE_MAX_VARIABLE_NUMBER 设置为 250000 的较年轻的 Intel 机器上,该函数在大约 2.6 秒内运行并返回 99999.如果我设置 high = 1000000,执行时间大约为 4.5 秒.

With the below example, on my machine, setting range(150) leads to the error, while range(100) does not:

from peewee import *

database = SqliteDatabase(None)

class Base(Model):
    class Meta:
        database = database


colnames = ["A", "B", "C", "D", "E", "F", "G", "H"]
cols = {x: TextField() for x in colnames}

table = type('mytable', (Base,), cols)
database.init('test.db')
database.create_tables([table])

data = []
for x in range(150):
    data.append({x: 1 for x in colnames})


with database.atomic() as txn:
    table.insert_many(data).upsert().execute()

Leads to:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py", line 3213, in execute
    cursor = self._execute()
  File "/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py", line 2628, in _execute
    return self.database.execute_sql(sql, params, self.require_commit)
  File "/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py", line 3461, in execute_sql
    self.commit()
  File "/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py", line 3285, in __exit__
    reraise(new_type, new_type(*exc_args), traceback)
  File "/cluster/home/ifiddes/python2.7/lib/python2.7/site-packages/peewee.py", line 3454, in execute_sql
    cursor.execute(sql, params or ())
peewee.OperationalError: too many SQL variables

This seems very low to me. I am trying to use peewee to replace existing pandas based SQL construction, because pandas lacks support for a primary key. Only being able to insert ~100 records per loop is very low, and fragile if the number of columns goes up some day.

How can I make this work better? Is it possible?

解决方案

After some investigation, the problem appears to be related with the maximum number of parameters that a sql query can have: SQLITE_MAX_VARIABLE_NUMBER.

To be able to do big bulk inserts I first estimate SQLITE_MAX_VARIABLE_NUMBER and then use it to create chunks in the list of dictionaries I want to insert.

To estimate the value I use this function inspired by this answer:


def max_sql_variables():
    """Get the maximum number of arguments allowed in a query by the current
    sqlite3 implementation. Based on `this question
    `_

    Returns
    -------
    int
        inferred SQLITE_MAX_VARIABLE_NUMBER
    """
    import sqlite3
    db = sqlite3.connect(':memory:')
    cur = db.cursor()
    cur.execute('CREATE TABLE t (test)')
    low, high = 0, 100000
    while (high - 1) > low: 
        guess = (high + low) // 2
        query = 'INSERT INTO t VALUES ' + ','.join(['(?)' for _ in
                                                    range(guess)])
        args = [str(i) for i in range(guess)]
        try:
            cur.execute(query, args)
        except sqlite3.OperationalError as e:
            if "too many SQL variables" in str(e):
                high = guess
            else:
                raise
        else:
            low = guess
    cur.close()
    db.close()
    return low

SQLITE_MAX_VARIABLE_NUMBER = max_sql_variables()

Then I use the above variable to slice the data


with database.atomic() as txn:
    size = (SQLITE_MAX_VARIABLE_NUMBER // len(data[0])) -1
    # remove one to avoid issue if peewee adds some variable
    for i in range(0, len(data), size):
        table.insert_many(data[i:i+size]).upsert().execute()


An update about execution speed of max_sql_variables.

On a 3 years old Intel machine with 4 cores and 4 Gb of RAM, running OpenSUSE tumbleweed, with SQLITE_MAX_VARIABLE_NUMBER set to 999, the function runs in less that 100ms. If I set high = 1000000, the execution time becomes of the order of 300ms.

On a younger Intel machine with 8 cores and 8Gb of RAM, running Kubuntu, with SQLITE_MAX_VARIABLE_NUMBER set to 250000, the function runs in about 2.6 seconds and returns 99999. If I set high = 1000000, the execution time becomes of the order of 4.5 seconds.

这篇关于peewee.OperationalError:只有 150 行 * 8 列的 upsert 上的 SQL 变量太多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆