获得连续几天以来评论次数最高的应用 [英] Get apps with the highest review count since a dynamic series of days

查看:63
本文介绍了获得连续几天以来评论次数最高的应用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个表, apps reviews (为便于讨论而简化):

I have two tables, apps and reviews (simplified for the sake of discussion):

id int

评论

id          int
review_date date
app_id      int (foreign key that points to apps)

2个问题:

给出从最早的 reviews.review_date 到最新的 reviews.review_date (按天递增)的一系列日期,每个日期为 D ,如果应用程序最早的评论是在 D 或之后出现的,则哪些应用程序的评论最多?

Given a series of dates from the earliest reviews.review_date to the latest reviews.review_date (incrementing by a day), for each date, D, which apps had the most reviews if the app's earliest review was on or later than D?

我想我知道如果给出明确的日期怎么写查询:

I think I know how to write a query if given an explicit date:

SELECT
  apps.id,
  count(reviews.*)
FROM
  reviews
  INNER JOIN apps ON apps.id = reviews.app_id
group by
  1
having
  min(reviews.review_date) >= '2020-01-01'
  order by 2 desc
limit 10;

但是我不知道如何在给定所需的日期序列的情况下动态查询此信息,并在单个视图中汇总所有这些信息.

But I don't know how to query this dynamically given the desired date series and compile all this information in a single view.

最好在每个日期同时具有评论数和 app_id .截至目前,我正在思考可能看起来像这样的东西:

It would be nice to have the # of reviews at the time for each date as well as the app_id. As of now I'm thinking something that might look like:

... 2020-01-01_app_id | 2020-01-01_review_count | 2020-01-02_app_id | 2020-01-02_review_count ...

但是我想知道是否有更好的方法可以做到这一点.将数据拼接在一起似乎也是一个挑战.

But I'm wondering if there's a better way to do this. Stitching the data together also seems like a challenge.

推荐答案

认为这就是您要寻找的:

WITH cte AS (  -- MATERIALIZED
   SELECT app_id, min(review_date) AS earliest_review, count(*)::int AS total_ct
   FROM   reviews
   GROUP  BY 1
   )
SELECT *
FROM  (
   SELECT generate_series(min(review_date)
                        , max(review_date)
                        , '1 day')::date
   FROM   reviews
   ) d(review_window_start)
LEFT  JOIN LATERAL (
   SELECT total_ct, array_agg(app_id) AS apps
   FROM  (
      SELECT app_id, total_ct
      FROM   cte c
      WHERE  c.earliest_review >= d.review_window_start
      ORDER  BY total_ct DESC
      FETCH  FIRST 1 ROWS WITH TIES  -- new & hot
      ) sub
   GROUP  BY 1
   ) a ON true;

有领带使它便宜一些.已在Postgres 13(当前为beta)中添加.参见:

WITH TIES makes it a bit cheaper. Added in Postgres 13 (currently beta). See:

WITH cte AS (  -- MATERIALIZED
   SELECT app_id, min(review_date) AS earliest_review, count(*)::int AS total_ct
   FROM   reviews
   GROUP  BY 1
   )
SELECT *
FROM  (
   SELECT generate_series(min(review_date)
                        , max(review_date)
                        , '1 day')::date
   FROM   reviews
   ) d(review_window_start)
LEFT  JOIN LATERAL (
   SELECT total_ct, array_agg(app_id) AS apps
   FROM  (
      SELECT total_ct, app_id
          ,  rank() OVER (ORDER BY total_ct DESC) AS rnk
      FROM   cte c
      WHERE  c.earliest_review >= d.review_window_start
      ) sub
   WHERE  rnk = 1
   GROUP  BY 1
   ) a ON true;

db<>小提琴此处

与上述相同,但没有有领带.

我们根本不需要涉及表 apps .表格 reviews 包含了我们需要的所有信息.

We don't need to involve the table apps at all. The table reviews has all information we need.

CTE cte 计算最早的评论&每个应用程序当前的总计数.CTE避免了重复计算.应该会帮助很多.
它总是在Postgres 12之前实现,并且应该在Postgres 12中自动实现,因为它在主查询中使用了很多次.另外,您可以添加关键字 MATERIALIZED 在Postgres 12或更高版本中强制使用.参见:

The CTE cte computes earliest review & current total count per app. The CTE avoids repeated computation. Should help quite a bit.
It is always materialized before Postgres 12, and should be materialized automatically in Postgres 12 since it is used many times in the main query. Else you could add the keyword MATERIALIZED in Postgres 12 or later to force it. See:

经过优化的 generate_series()调用产生从最早到最新审核的那几天.参见:

The optimized generate_series() call produces the series of days from earliest to latest review. See:

最后,您已经发现的 LEFT JOIN LATERAL .但是,由于多个应用可以并列以获得最多的评论,因此请检索所有获奖者,可以是0-n个应用.该查询将所有每日获胜者汇总到一个数组中,因此我们每个 review_window_start 都获得一个结果行.另外,定义决胜局最多获得一个一个获胜者.参见:

Finally, the LEFT JOIN LATERAL you already discovered. But since multiple apps can tie for the most reviews, retrieve all winners, which can be 0 - n apps. The query aggregates all daily winners into an array, so we get a single result row per review_window_start. Alternatively, define tiebreaker(s) to get at most one winner. See:

这篇关于获得连续几天以来评论次数最高的应用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆