为什么SQLAlchemy count()比原始查询慢得多? [英] Why is SQLAlchemy count() much slower than the raw query?
问题描述
我正在将SQLAlchemy与MySQL数据库配合使用,我想对表中的行进行计数(大约300k). SQLAlchemy count 函数所花费的时间大约是其50倍直接在MySQL中编写相同的查询即可运行.我在做错什么吗?
I'm using SQLAlchemy with a MySQL database and I'd like to count the rows in a table (roughly 300k). The SQLAlchemy count function takes about 50 times as long to run as writing the same query directly in MySQL. Am I doing something wrong?
# this takes over 3 seconds to return
session.query(Segment).count()
但是:
SELECT COUNT(*) FROM segments;
+----------+
| COUNT(*) |
+----------+
| 281992 |
+----------+
1 row in set (0.07 sec)
速度差异随着表的大小而增加(在10万行以下几乎看不到).
The difference in speed increases with the size of the table (it is barely noticeable under 100k rows).
更新
使用session.query(Segment.id).count()
而不是session.query(Segment).count()
似乎可以解决问题,并使其达到最高速度.我仍然感到困惑,为什么初始查询会变慢.
Using session.query(Segment.id).count()
instead of session.query(Segment).count()
seems to do the trick and get it up to speed. I'm still puzzled why the initial query is slower though.
推荐答案
不幸的是,MySQL对子查询的支持非常糟糕,这给我们带来了非常不利的影响. SQLAlchemy文档指出可以使用query(func.count(Segment.id))
来实现优化"查询:
Unfortunately MySQL has terrible, terrible support of subqueries and this is affecting us in a very negative way. The SQLAlchemy docs point out that the "optimized" query can be achieved using query(func.count(Segment.id))
:
返回此查询将返回的行数.
Return a count of rows this Query would return.
这将为此查询生成SQL,如下所示:
This generates the SQL for this Query as follows:
SELECT count(1) AS count_1 FROM (
SELECT <rest of query follows...> ) AS anon_1
要对要计数的特定列进行精细控制,请跳过 使用子查询或以其他方式控制FROM子句,或使用 其他聚合函数,请将func表达式与 query(),即:
For fine grained control over specific columns to count, to skip the usage of a subquery or otherwise control of the FROM clause, or to use other aggregate functions, use func expressions in conjunction with query(), i.e.:
from sqlalchemy import func
# count User records, without
# using a subquery.
session.query(func.count(User.id))
# return count of user "id" grouped
# by "name"
session.query(func.count(User.id)).\
group_by(User.name)
from sqlalchemy import distinct
# count distinct "name" values
session.query(func.count(distinct(User.name)))
这篇关于为什么SQLAlchemy count()比原始查询慢得多?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!