SQLAlchemy-如何在多个列上进行非重复计数 [英] SQLAlchemy - How to count distinct on multiple columns

查看:181
本文介绍了SQLAlchemy-如何在多个列上进行非重复计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个查询:

SELECT COUNT(DISTINCT Serial, DatumOrig, Glucose) FROM values;

我试图用SQLAlchemy重新创建它:

session.query(Value.Serial, Value.DatumOrig, Value.Glucose).distinct().count()

但这可以翻译为:

SELECT count(*) AS count_1
    FROM (SELECT DISTINCT 
           values.`Serial` AS `values_Serial`, 
           values.`DatumOrig` AS `values_DatumOrig`,
           values.`Glucose` AS `values_Glucose`
          FROM values)
    AS anon_1

不调用内联计数函数,而是将 select different 包装到子查询中.

Which does not call the count function inline but wraps the select distinct into a subquery.

我的问题是:SQLAlchemy用什么不同的方法来计算多个列上的不同选择,它们将转化为什么?

My question is: What are the different ways with SQLAlchemy to count a distinct select on multiple columns and what are they translating into?

有什么解决方案可以转换为我的原始查询吗?在性能或内存使用上有什么严重的区别吗?

Is there any solution which would translate into my original query? Is there any serious difference in performance or memory usage?

推荐答案

首先,我认为COUNT(DISTINCT)支持多个表达式是MySQL的扩展.您可以在具有ROW值的PostgreSQL中实现相同的功能,但是有关NULL的行为并不相同.在MySQL中,如果任何值表达式的计算结果为NULL,则该行不符合条件.这也导致问题中两个查询之间的区别:

First off, I think that COUNT(DISTINCT) supporting more than 1 expression is a MySQL extension. You can kind of achieve the same in for example PostgreSQL with ROW values, but the behaviour is not the same regarding NULL. In MySQL if any of the value expressions evaluate to NULL, the row does not qualify. That also leads to what is different between the two queries in the question:

  1. 如果COUNT(DISTINCT)查询中的SerialDatumOrigGlucose中的任何一个为NULL,则该行不合格或换句话说,不计算在内.
  2. COUNT(*)是子查询anon_1的基数,换句话说就是行数. SELECT DISTINCT Serial, DatumOrig, Glucose将包含(不同的)具有NULL的行.
  1. If any of Serial, DatumOrig, or Glucose is NULL in the COUNT(DISTINCT) query, that row does not qualify or in other words does not count.
  2. COUNT(*) is the cardinality of the subquery anon_1, or in other words the count of rows. SELECT DISTINCT Serial, DatumOrig, Glucose will include (distinct) rows with NULL.

查看2个查询的EXPLAIN输出,看起来子查询导致MySQL使用临时表.这可能会导致性能差异,尤其是在磁盘上实现时.

Looking at EXPLAIN output for the 2 queries it looks like the subquery causes MySQL to use a temporary table. That will likely cause a performance difference, especially if it is materialized on disk.

在SQLAlchemy中生成多值COUNT(DISTINCT)查询有点棘手,因为 text() 片段,例如本例:

Producing the multi valued COUNT(DISTINCT) query in SQLAlchemy is a bit tricky, because count() is a generic function and implemented closer to the SQL standard. It only accepts a single expression as its (optional) positional argument and the same goes for distinct(). If all else fails, you can always revert to text() fragments, like in this case:

# NOTE: text() fragments are included in the query as is, so if the text originates
# from an untrusted source, the query cannot be trusted.
session.query(func.count(distinct(text("`Serial`, `DatumOrig`, `Glucose`")))).\
    select_from(Value).\
    scalar()

远非易读和可维护的代码,但现在可以完成工作.另一个选择是编写一个实现MySQL扩展的自定义结构,或者按照您的尝试重写查询.形成产生所需SQL的自定义构造的一种方法是:

which is far from readable and maintainable code, but gets the job done now. Another option is to write a custom construct that implements the MySQL extension, or rewrite the query as you have attempted. One way to form a custom construct that produces the required SQL would be:

from itertools import count
from sqlalchemy import func, distinct as _distinct

def _comma_list(exprs):
    # NOTE: Magic number alert, the precedence value must be large enough to avoid
    # producing parentheses around the "comma list" when passed to distinct()
    ps = count(10 + len(exprs), -1)
    exprs = iter(exprs)
    cl = next(exprs)
    for p, e in zip(ps, exprs):
        cl = cl.op(',', precedence=p)(e)

    return cl

def distinct(*exprs):
    return _distinct(_comma_list(exprs))

session.query(func.count(distinct(
    Value.Serial, Value.DatumOrig, Value.Glucose))).scalar()

这篇关于SQLAlchemy-如何在多个列上进行非重复计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆