Django Count和Sum批注相互干扰 [英] Django Count and Sum annotations interfere with each other

查看:57
本文介绍了Django Count和Sum批注相互干扰的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在构建带有多个批注的复合物 QuerySet 时,我遇到了一个问题,该问题可以通过以下简单设置重现。

While constructing a complexe QuerySet with several annotations, I ran into an issue that I could reproduce with the following simple setup.

以下是模型:

class Player(models.Model):
    name = models.CharField(max_length=200)

class Unit(models.Model):
    player = models.ForeignKey(Player, on_delete=models.CASCADE,
                               related_name='unit_set')
    rarity = models.IntegerField()

class Weapon(models.Model):
    unit = models.ForeignKey(Unit, on_delete=models.CASCADE,
                             related_name='weapon_set')

通过测试数据库,我获得了以下内容(正确)结果:

With my test database, I obtain the following (correct) results:

Player.objects.annotate(weapon_count=Count('unit_set__weapon_set'))

[{'id': 1, 'name': 'James', 'weapon_count': 23},
 {'id': 2, 'name': 'Max', 'weapon_count': 41},
 {'id': 3, 'name': 'Bob', 'weapon_count': 26}]


Player.objects.annotate(rarity_sum=Sum('unit_set__rarity'))

[{'id': 1, 'name': 'James', 'rarity_sum': 42},
 {'id': 2, 'name': 'Max', 'rarity_sum': 89},
 {'id': 3, 'name': 'Bob', 'rarity_sum': 67}]

如果我现在将两个注释合并在同一个 QuerySet ,我得到了不同的(不准确的)结果:

If I now combine both annotations in the same QuerySet, I obtain a different (inaccurate) results:

Player.objects.annotate(
    weapon_count=Count('unit_set__weapon_set', distinct=True),
    rarity_sum=Sum('unit_set__rarity'))

[{'id': 1, 'name': 'James', 'weapon_count': 23, 'rarity_sum': 99},
 {'id': 2, 'name': 'Max', 'weapon_count': 41, 'rarity_sum': 183},
 {'id': 3, 'name': 'Bob', 'weapon_count': 26, 'rarity_sum': 113}]

请注意 rarity_sum 现在如何具有不同的值 比以前。删除 distinct = True 不会影响结果。我还尝试使用DistinctSum 函数-values / 54278155#54278155>此答案,在这种情况下,所有 rarity_sum 都设置为 18 (也不准确)。

Notice how rarity_sum have now different values than before. Removing distinct=True does not affect the result. I also tried to use the DistinctSum function from this answer, in which case all rarity_sum are set to 18 (also inaccurate).

为什么?如何在同一个 QuerySet 中合并这两个注释?

Why is this? How can I combine both annotations in the same QuerySet?

编辑:这是组合QuerySet生成的sqlite查询:

Edit: here is the sqlite query generated by the combined QuerySet:

SELECT "sandbox_player"."id",
       "sandbox_player"."name",
       COUNT(DISTINCT "sandbox_weapon"."id") AS "weapon_count",
       SUM("sandbox_unit"."rarity")          AS "rarity_sum"
FROM "sandbox_player"
         LEFT OUTER JOIN "sandbox_unit" ON ("sandbox_player"."id" = "sandbox_unit"."player_id")
         LEFT OUTER JOIN "sandbox_weapon" ON ("sandbox_unit"."id" = "sandbox_weapon"."unit_id")
GROUP BY "sandbox_player"."id", "sandbox_player"."name"

用于以上结果的数据是在此处可用

The data used for the results above is available here.

推荐答案

这不是Django ORM的问题,这只是关系型数据库的方式ses工作。当您构建简单的查询集时,例如

This isn't the problem with Django ORM, this is just the way relational databases work. When you're constructing simple querysets like

Player.objects.annotate(weapon_count=Count('unit_set__weapon_set'))

Player.objects.annotate(rarity_sum=Sum('unit_set__rarity'))

ORM确实可以满足您的要求希望它能做到-将 Player 武器

ORM does exactly what you expect it to do - join Player with Weapon

SELECT "sandbox_player"."id", "sandbox_player"."name", COUNT("sandbox_weapon"."id") AS "weapon_count"
FROM "sandbox_player"
LEFT OUTER JOIN "sandbox_unit" 
    ON ("sandbox_player"."id" = "sandbox_unit"."player_id")
LEFT OUTER JOIN "sandbox_weapon" 
    ON ("sandbox_unit"."id" = "sandbox_weapon"."unit_id")
GROUP BY "sandbox_player"."id", "sandbox_player"."name"

Player Unit

SELECT "sandbox_player"."id", "sandbox_player"."name", SUM("sandbox_unit"."rarity") AS "rarity_sum"
FROM "sandbox_player"
LEFT OUTER JOIN "sandbox_unit" ON ("sandbox_player"."id" = "sandbox_unit"."player_id")
GROUP BY "sandbox_player"."id", "sandbox_player"."name"

并执行 COUNT SUM 聚合。

请注意,尽管第一个查询在三个表之间有两个联接,中间表 Unit 既不在 SELECT 引用的列中,也不在 GROUP BY 子句。 部队在这里扮演的唯一角色是通过武器玩家 c>。

Note that although the first query has two joins between three tables, the intermediate table Unit is neither in columns referenced in SELECT, nor in the GROUP BY clause. The only role that Unit plays here is to join Player with Weapon.

现在,如果您查看第三个查询集,情况将变得更加复杂。同样,在第一个查询中,联接位于三个表之间,但是现在在 SELECT 中引用了 Unit SUM 用于 Unit.rarity 的聚合:

Now if you look at your third queryset, things get more complicated. Again, as in the first query the joins are between three tables, but now Unit is referenced in SELECT as there is SUM aggregation for Unit.rarity:

SELECT "sandbox_player"."id",
       "sandbox_player"."name",
       COUNT(DISTINCT "sandbox_weapon"."id") AS "weapon_count",
       SUM("sandbox_unit"."rarity")          AS "rarity_sum"
FROM "sandbox_player"
         LEFT OUTER JOIN "sandbox_unit" ON ("sandbox_player"."id" = "sandbox_unit"."player_id")
         LEFT OUTER JOIN "sandbox_weapon" ON ("sandbox_unit"."id" = "sandbox_weapon"."unit_id")
GROUP BY "sandbox_player"."id", "sandbox_player"."name"

这是第二和第三查询之间的关键区别。在第二个查询中,您将 Player 加入 Unit ,因此只有一个 Unit 会针对其引用的每个玩家列出一次。

And this is the crucial difference between the second and the third queries. In the second query, you're joining Player to Unit, so a single Unit will be listed once for each player that it references.

但是在第三个查询中,您要加入 Player 转换为 Unit ,然后将 Unit 转换为武器,因此,不仅会为引用的每个玩家列出一个 Unit 一次,而且还会为引用 Unit

But in the third query you're joining Player to Unit and then Unit to Weapon, so not only a single Unit will be listed once for each player that it references, but also for each weapon that references Unit.

让我们看一个简单的示例:

Let's take a look at the simple example:

insert into sandbox_player values (1, "player_1");

insert into sandbox_unit values(1, 10, 1);

insert into sandbox_weapon values (1, 1), (2, 1);

一个玩家,一个单位和两个引用相同单位的武器。

One player, one unit and two weapons that reference the same unit.

确认问题存在:

>>> from sandbox.models import Player
>>> from django.db.models import Count, Sum

>>> Player.objects.annotate(weapon_count=Count('unit_set__weapon_set')).values()
<QuerySet [{'id': 1, 'name': 'player_1', 'weapon_count': 2}]>

>>> Player.objects.annotate(rarity_sum=Sum('unit_set__rarity')).values()
<QuerySet [{'id': 1, 'name': 'player_1', 'rarity_sum': 10}]>


>>> Player.objects.annotate(
...     weapon_count=Count('unit_set__weapon_set', distinct=True),
...     rarity_sum=Sum('unit_set__rarity')).values()
<QuerySet [{'id': 1, 'name': 'player_1', 'weapon_count': 2, 'rarity_sum': 20}]>

从此示例中可以很容易地看出问题是在组合查询中单位将被列出两次,每种武器都将被列出一次。

From this example it's easy to see that the problem is that in the combined query the unit will be listed twice, one time for each of the weapons referencing it:

sqlite> SELECT "sandbox_player"."id",
   ...>        "sandbox_player"."name",
   ...>        "sandbox_weapon"."id",
   ...>        "sandbox_unit"."rarity"
   ...> FROM "sandbox_player"
   ...>          LEFT OUTER JOIN "sandbox_unit" ON ("sandbox_player"."id" = "sandbox_unit"."player_id")
   ...>          LEFT OUTER JOIN "sandbox_weapon" ON ("sandbox_unit"."id" = "sandbox_weapon"."unit_id");
id          name        id          rarity    
----------  ----------  ----------  ----------
1           player_1    1           10        
1           player_1    2           10   





如@ivissani所述,最简单的解决方案之一是为每个聚合编写子查询:

What should you do?

As @ivissani mentioned, one of the easiest solutions would be to write subqueries for each of the aggregations:

>>> from django.db.models import Count, IntegerField, OuterRef, Subquery, Sum
>>> weapon_count = Player.objects.annotate(weapon_count=Count('unit_set__weapon_set')).filter(pk=OuterRef('pk'))
>>> rarity_sum = Player.objects.annotate(rarity_sum=Sum('unit_set__rarity')).filter(pk=OuterRef('pk'))
>>> qs = Player.objects.annotate(
...     weapon_count=Subquery(weapon_count.values('weapon_count'), output_field=IntegerField()),
...     rarity_sum=Subquery(rarity_sum.values('rarity_sum'), output_field=IntegerField())
... )
>>> qs.values()
<QuerySet [{'id': 1, 'name': 'player_1', 'weapon_count': 2, 'rarity_sum': 10}]>

会产生以下SQL

SELECT "sandbox_player"."id", "sandbox_player"."name", 
(
    SELECT COUNT(U2."id") AS "weapon_count"
    FROM "sandbox_player" U0 
    LEFT OUTER JOIN "sandbox_unit" U1
        ON (U0."id" = U1."player_id")
    LEFT OUTER JOIN "sandbox_weapon" U2 
        ON (U1."id" = U2."unit_id")
    WHERE U0."id" = ("sandbox_player"."id") 
    GROUP BY U0."id", U0."name"
) AS "weapon_count", 
(
    SELECT SUM(U1."rarity") AS "rarity_sum"
    FROM "sandbox_player" U0
    LEFT OUTER JOIN "sandbox_unit" U1
        ON (U0."id" = U1."player_id")
    WHERE U0."id" = ("sandbox_player"."id")
GROUP BY U0."id", U0."name") AS "rarity_sum"
FROM "sandbox_player"

这篇关于Django Count和Sum批注相互干扰的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆