Django& Postgres-百分位数(中位数)和分组依据 [英] Django & Postgres - percentile (median) and group by

查看:307
本文介绍了Django& Postgres-百分位数(中位数)和分组依据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要计算每个卖方ID的期间中位数(请参见下面的简单模型)。问题是我无法构造ORM查询。

I need to calculate period medians per seller ID (see simplyfied model below). The problem is I am unable to construct the ORM query.

模型

class MyModel:
    period = models.IntegerField(null=True, default=None)
    seller_ids = ArrayField(models.IntegerField(), default=list)
    aux = JSONField(default=dict)

查询

queryset = (
    MyModel.objects.filter(period=25)
    .annotate(seller_id=Func(F("seller_ids"), function="unnest"))
    .values("seller_id")
    .annotate(
        duration=Cast(KeyTextTransform("duration", "aux"), IntegerField()),
        median=Func(
            F("duration"),
            function="percentile_cont",
            template="%(function)s(0.5) WITHIN GROUP (ORDER BY %(expressions)s)",
        ),
    )
    .values("median", "seller_id")
)

ArrayField聚合( seller_id )来源



我认为我需要做的是以下几点


I think what I need to do is something along the lines below

select t.*, p_25, p_75
from t join
     (select district,
             percentile_cont(0.25) within group (order by sales) as p_25,
             percentile_cont(0.75) within group (order by sales) as p_75
      from t
      group by district
     ) td
     on t.district = td.district

示例源上方



Python 3.7.5,Django 2.2.8 ,Postgres 11.1


Python 3.7.5, Django 2.2.8, Postgres 11.1

推荐答案

这就是诀窍。

from django.db.models import F, Func, IntegerField
from django.db.models.aggregates import Aggregate


queryset = (
    MyModel.objects.filter(period=25)
    .annotate(duration=Cast(KeyTextTransform("duration", "aux"), IntegerField()))
    .filter(duration__isnull=False)
    .annotate(seller_id=Func(F("seller_ids"), function="unnest"))
    .values("seller_id")  # group by
    .annotate(
        median=Aggregate(
            F("duration"),
            function="percentile_cont",
            template="%(function)s(0.5) WITHIN GROUP (ORDER BY %(expressions)s)",
        ),
    )
)

注意 中位数注释使用 Aggregate 而不使用 Func 的问题。
另外, annotate()和filter()子句的顺序以及 annotate()和values()子句的顺序 非常重要

Notice the median annotation employs Aggregate and not Func as in the question. Also, order of annotate() and filter() clauses as well as order of annotate() and values() clauses matters a lot!

顺便说一句,生成的SQL没有嵌套的select和join。

BTW the resulting SQL is without a nested select and join.

这篇关于Django& Postgres-百分位数(中位数)和分组依据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆