如何获得基于子查询的计数? [英] How to get a Count based on a subquery?

查看:63
本文介绍了如何获得基于子查询的计数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尽管我一直在尝试根据网络搜索进行尝试,但我仍然无法使查询正常运行,我想在变得疯狂之前我需要一些帮助.

I am suffering to get a query working despite all I have been trying based on my web search, and I think I need some help before becoming crazy.

我有四个型号:

class Series(models.Model):
    puzzles = models.ManyToManyField(Puzzle, through='SeriesElement', related_name='series')
    ...

class Puzzle(models.Model):
    puzzles = models.ManyToManyField(Puzzle, through='SeriesElement', related_name='series')
    ...

class SeriesElement(models.Model):
    puzzle = models.ForeignKey(Puzzle,on_delete=models.CASCADE,verbose_name='Puzzle',)
    series = models.ForeignKey(Series,on_delete=models.CASCADE,verbose_name='Series',)
    puzzle_index = models.PositiveIntegerField(verbose_name='Order',default=0,editable=True,)

class Play(models.Model):
    puzzle = models.ForeignKey(Puzzle, on_delete=models.CASCADE, related_name='plays')
    user = models.ForeignKey(settings.AUTH_USER_MODEL, blank=True,null=True, on_delete=models.SET_NULL, related_name='plays')
    series = models.ForeignKey(Series, blank=True, null=True, on_delete=models.SET_NULL, related_name='plays')
    puzzle_completed = models.BooleanField(default=None, blank=False, null=False)
    ...

每个用户每次创建一个 Play 记录都可以多次玩任何拼图.这意味着对于给定的一组(用户,系列,拼图),我们可以有几条 Play 记录,有些带有 puzzle_completed =真,有些带有 puzzle_completed = False

each user can play any puzzle several times, each time creating a Play record. that means that for a given set of (user,series,puzzle) we can have several Play records, some with puzzle_completed = True, some with puzzle_completed = False

我正在尝试(未成功)实现的是,通过注释为每个系列计算谜题的数量 nb_completed_by_user nb_not_completed_by_user .

What I am trying (unsuccesfully) to achieve, is to calculate for each series, through an annotation, the number of puzzles nb_completed_by_user and nb_not_completed_by_user.

对于 nb_completed_by_user ,我有一些可以在几乎所有情况下使用的功能(到目前为止,我无法解释的一项测试中有一个小故障):

For nb_completed_by_user, I have something which works in almost all cases (I have one glitch in one of my test that I cannot explain so far):

Series.objects.annotate(nb_completed_by_user=Count('puzzles',
filter=Q(puzzles__plays__puzzle_completed=True, 
    puzzles__plays__series_id=F('id'),puzzles__plays__user=user), distinct=True))

对于 nb_not_completed_by_user ,我能够在 Puzzle 上进行查询,这给了我很好的答案,但是我无法将其转换为 Subquery表达式可以正常工作而不会引发错误,或者获得 Count 表达式可以为我提供正确的答案.

For nb_not_completed_by_user, I was able to make a query on Puzzle that gives me the good answer, but I am not able to transform it into a Subquery expression that works without throwing up an error, or to get a Count expression to give me the proper answer.

此作品有效:

puzzles = Puzzle.objects.filter(~Q(plays__puzzle_completed=True,
 plays__series_id=1, plays__user=user),series=s)

但是当尝试移动到子查询时,我找不到使用以下表达式不引发错误的方法: ValueError:此查询集包含对外部查询的引用,只能在子查询中使用

but when trying to move to a subquery, I cannot find the way to use the following expression not to throw the error:ValueError: This queryset contains a reference to an outer query and may only be used in a subquery.

pzl_completed_by_user = Puzzle.objects.filter(plays__series_id=OuterRef('id')).exclude(
    plays__puzzle_completed=True,plays__series_id=OuterRef('id'), plays__user=user)

和下面的 Count 表达式不能给我正确的结果:

and the following Count expression doesn't give me the right result:

Series.objects.annotate(nb_not_completed_by_user=Count('puzzles', filter=~Q(
            puzzle__plays__puzzle_completed=True, puzzle__plays__series_id=F('id'), 
            puzzle__plays__user=user))

有人可以解释我如何获得这两个价值吗?并最终向我提出一个链接,该链接清楚地说明了如何在不那么明显的情况下(与官方文档中的情况相比)使用子查询

Could anybody explain me how I could obtain both values ? and eventually to propose me a link which explains clearly how to use subqueries for less-obvious cases than those in the official documentation

预先感谢

编辑2021年3月:我最近发现了两个帖子,这些帖子引导我为这个特定问题提供了一种可能的解决方案: Django Count和Sum注释会相互干扰 Django 1.11注释子查询聚合

Edit March 2021: I recently found two posts which guided me through one potential solution to this specific issue: Django Count and Sum annotations interfere with each other and Django 1.11 Annotating a Subquery Aggregate

我从> https://stackoverflow.com/users/188/matthew-schinckel 实施了建议的解决方案和 https://stackoverflow.com/users/1164966/benoit-blanchon 有帮助类: SubqueryCount(Subquery)类和 SubquerySum(Subquery)

I implemented the proposed solution from https://stackoverflow.com/users/188/matthew-schinckel and https://stackoverflow.com/users/1164966/benoit-blanchon having help classes: class SubqueryCount(Subquery) and class SubquerySum(Subquery)

class SubqueryCount(Subquery):
    template = "(SELECT count(*) FROM (%(subquery)s) _count)"
    output_field = PositiveIntegerField()


class SubquerySum(Subquery):
    template = '(SELECT sum(_sum."%(column)s") FROM (%(subquery)s) _sum)'

    def __init__(self, queryset, column, output_field=None, **extra):
        if output_field is None:
            output_field = queryset.model._meta.get_field(column)
        super().__init__(queryset, output_field, column=column, **extra)

效果非常好!并且比常规的Django Count注释要快得多....至少在SQlite中,也许在其他人所说的PostgreSQL中.

It works extremely well ! and is far quicker than the conventional Django Count annotation. ... at least in SQlite, and probably PostgreSQL as stated by others.

但是,当我在MariaDB环境中尝试...时,它崩溃了!MariaDB显然不能够/不愿意处理相关的子查询,因为它们被认为是次优的.

But when I tried in a MariaDB environnement ... it crashed ! MariaDB is apparently not able / not willing to handle correlated subqueries as those are considered sub-optimal.

以我为例,当我尝试从数据库中同时为每个记录获取多个Count/distinct批注时,我确实看到了性能的巨大提升(在SQLite中)我想在MariaDB中复制.

In my case, as I try to get from the database multiple Count/distinct annotations for each record at the same time, I really see a tremendous gain in performance (in SQLite) that I would like to replicate in MariaDB.

任何人都可以帮助我找出一种为MariaDB实现这些辅助功能的方法吗?

Would anyone be able to help me figure out a way to implement those helper functions for MariaDB ?

在此环境中 template 应该是什么?

What should template be in this environnement?

马修·辛克尔?benoit-blanchon?rktavi?

matthew-schinckel ? benoit-blanchon ? rktavi ?

推荐答案

再深入一点,详细分析Django文档,我终于能够找到一种令人满意的方式来基于子查询生成Count或Sum

Going a bit deeper and analysis the Django docs a bit more in details, I was finally able to produce a satisfying way to produce a Count or Sum based on subquery.

为简化过程,我定义了以下辅助函数:

For simplifying the process, I defined the following helper functions:

要生成子查询:

def get_subquery(app_label, model_name, reference_to_model_object, filter_parameters={}):
    """
    Return a subquery from a given model (work with both FK & M2M)
    can add extra filter parameters as dictionary:

    Use:
        subquery = get_subquery(
                    app_label='puzzles', model_name='Puzzle',
                    reference_to_model_object='puzzle_family__target'
                    )
        or directly:
        qs.annotate(nb_puzzles=subquery_count(get_subquery(
            'puzzles', 'Puzzle','puzzle_family__target')),)
    """
    model = apps.get_model(app_label, model_name)

    # we need to declare a local dictionary to prevent the external dictionary to be changed by the update method:
    parameters = {f'{reference_to_model_object}__id': OuterRef('id')}
    parameters.update(filter_parameters)
    # putting '__id' instead of '_id' to work with both FK & M2M
    return model.objects.filter(**parameters).order_by().values(f'{reference_to_model_object}__id')

要计算通过 get_subquery 生成的子查询:

To count the subquery generated through get_subquery:

def subquery_count(subquery):
    """  
    Use:
        qs.annotate(nb_puzzles=subquery_count(get_subquery(
            'puzzles', 'Puzzle','puzzle_family__target')),)
    """
    return Coalesce(Subquery(subquery.annotate(count=Count('pk', distinct=True)).order_by().values('count'), output_field=PositiveIntegerField()), 0)

对通过 field_to_sum 字段上的 get_subquery 生成的子查询求和:

To sum the subquery generated through get_subquery on the field field_to_sum:

def subquery_sum(subquery, field_to_sum, output_field=None):
    """  
    Use:
        qs.annotate(total_points=subquery_sum(get_subquery(
            'puzzles', 'Puzzle','puzzle_family__target'),'points'),)
    """
    if output_field is None:
        output_field = queryset.model._meta.get_field(column)

    return Coalesce(Subquery(subquery.annotate(result=Sum(field_to_sum, output_field=output_field)).order_by().values('result'), output_field=output_field), 0)

所需的导入:

from django.db.models import Count, Subquery, PositiveIntegerField, DecimalField, Sum
from django.db.models.functions import Coalesce

我花了很多时间解决这个问题...我希望这会避免许多人因寻找正确的前进方式而感到沮丧.

I spent so many hours on solving this ... I hope that this will save many of you all the frustration I went through figuring out the right way to proceed.

这篇关于如何获得基于子查询的计数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆