多个注释和项产生夸张的答案 [英] multiple annotate Sum terms yields inflated answer

查看:73
本文介绍了多个注释和项产生夸张的答案的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在以下设置中,我想要一个带有项目列表的QuerySet,每个项目都标有其所有任务工期的总和(如tasks_duration)和其所有任务子任务工期的总和(如subtasks_duration)。我的模型(简化的)看起来像这样:

In the following setup, I'd like a QuerySet with a list of projects, each annotated with the sum of all its task durations (as tasks_duration) and the sum of all of its tasks' subtask durations (as subtasks_duration). My models (simplified) look like this:

class Project(models.Model):
    pass

class Task(models.Model):
    project = models.ForeignKey(Project)
    duration = models.IntegerField(blank=True, null=True)

class SubTask(models.Model):
    task = models.ForeignKey(Task)
    duration = models.IntegerField(blank=True, null=True)

我使我的QuerySet像这样:

I make my QuerySet like this:

Projects.objects.annotate(tasks_duration=Sum('task__duration'), subtasks_duration=Sum('task__subtask__duration'))

Django annotate()多次解释的行为有关,导致错误的答案我得到的task_duration比它高得多应该。多个annotate(Sum())子句在结果SQL中产生多个左内部联接。仅使用task_duration的一个annotate(Sum())术语,结果是正确的。但是,我想同时拥有task_duration和subtasks_duration。

Related to the behaviour explained in Django annotate() multiple times causes wrong answers I get a tasks_duration that is much higher than it should be. The multiple annotate(Sum()) clauses yield multiple left inner joins in the resultant SQL. With only a single annotate(Sum()) term for tasks_duration, the result is correct. However, I'd like to have both tasks_duration and subtasks_duration.

什么是执行此查询的合适方法?我有一个可行的解决方案,可以针对每个项目执行此操作,但是这种速度出乎意料的缓慢。我还可以通过extra()调用进行类似操作,但是我真的很想知道纯Django是否可以实现我想要的功能。

What would be a suitable way to do this query? I have a working solution that does it per-project, but that's expectedly unusably slow. I also have something similar working with an extra() call, but I'd really like to know if what I want is possible with pure Django.

推荐答案

已在此处报告了该错误,但即使在Django 1.11中也尚未解决。问题与以反向关系连接两个表有关。
请注意,distinct参数适用于Count,但不适用于Sum。因此,您可以使用技巧来编写如下所示的ORM:

The bug is reported here but it's not solved yet even in Django 1.11. The issue is related to joining two tables in reverse relations. Notice that distinct parameter works well for Count but not for Sum. So you can use a trick and write an ORM like below:

 Projects.objects.annotate(
      temp_tasks_duration=Sum('task__duration'),
      temp_subtasks_duration=Sum('task__subtask__duration'),
      tasks_count=Count('task'),
      tasks_count_distinct=Count('task', distinct=True),
      task_subtasks_count=Count('task__subtask'),
      task_subtasks_count_distinct=Count('task__subtask', distinct=True),
 ).annotate(
      tasks_duration=F('temp_tasks_duration')*F('tasks_count_distinct')/F('tasks_count'),
      subtasks_duration=F('temp_subtasks_duration')*F('subtasks_count_distinct')/F('subtasks_count'),
 )

更新:
我发现您需要使用子查询。在以下解决方案中,首先过滤与externalref相关的任务(外部查询的OuterRef引用,因此将为每个Project过滤任务),然后按 project对任务进行分组,以使Sum适用于所有每个项目的任务,如果该项目存在任何任务(您已按项目过滤,然后按同一字段分组;这就是为什么只能有一组任务的原因),则仅返回一个结果。否则返回None。如果项目没有任务,结果将为无,这意味着我们无法使用[0]选择计算得出的总和。

Update: I found that you need to use Subquery. In the following solution, firstly you filter tasks for related to the outerref (OuterRef references to the outer query, so the tasks are filtered for each Project), then you group the tasks by 'project', so that the Sum applies on all the tasks of each projects and returns just one result if any task exists for the project (you have filtered by 'project' and then grouped by that same field; That's why just one group can be there.) or None otherwise. The result would be None if the project has no task, that means we can not use [0] to select the calculated sum.

from django.db.models import Subquery, OuterRef
Projects.objects.annotate(
    tasks_duration=Subquery(
        Task.objects.filter(
            project=OuterRef('pk')
        ).values(
            'project'
        ).annotate(
            the_sum=Sum('task__duration'),
        ).values('the_sum')[:1]
    ),
    subtasks_duration=Sum('task__subtask__duration')
)

运行此代码只会向数据库发送一个查询,因此性能很好。

Running this code will send just one query to the database, so the performance is great.

这篇关于多个注释和项产生夸张的答案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆