获得连续几天以来评论次数最高的应用 [英] Get apps with the highest review count since a dynamic series of days
问题描述
我有两个表, apps
和 reviews
(为便于讨论而简化):
I have two tables, apps
and reviews
(simplified for the sake of discussion):
id int
评论
表
id int
review_date date
app_id int (foreign key that points to apps)
2个问题:
给出从最早的 reviews.review_date
到最新的 reviews.review_date
(按天递增)的一系列日期,每个日期为 D
,如果应用程序最早的评论是在 D
或之后出现的,则哪些应用程序的评论最多?
Given a series of dates from the earliest reviews.review_date
to the latest reviews.review_date
(incrementing by a day), for each date, D
, which apps had the most reviews if the app's earliest review was on or later than D
?
我想我知道如果给出明确的日期怎么写查询:
I think I know how to write a query if given an explicit date:
SELECT
apps.id,
count(reviews.*)
FROM
reviews
INNER JOIN apps ON apps.id = reviews.app_id
group by
1
having
min(reviews.review_date) >= '2020-01-01'
order by 2 desc
limit 10;
但是我不知道如何在给定所需的日期序列的情况下动态查询此信息,并在单个视图中汇总所有这些信息.
But I don't know how to query this dynamically given the desired date series and compile all this information in a single view.
最好在每个日期同时具有评论数和 app_id
.截至目前,我正在思考可能看起来像这样的东西:
It would be nice to have the # of reviews at the time for each date as well as the app_id
. As of now I'm thinking something that might look like:
... 2020-01-01_app_id | 2020-01-01_review_count | 2020-01-02_app_id | 2020-01-02_review_count ...
但是我想知道是否有更好的方法可以做到这一点.将数据拼接在一起似乎也是一个挑战.
But I'm wondering if there's a better way to do this. Stitching the data together also seems like a challenge.
推荐答案
我认为这就是您要寻找的:
WITH cte AS ( -- MATERIALIZED
SELECT app_id, min(review_date) AS earliest_review, count(*)::int AS total_ct
FROM reviews
GROUP BY 1
)
SELECT *
FROM (
SELECT generate_series(min(review_date)
, max(review_date)
, '1 day')::date
FROM reviews
) d(review_window_start)
LEFT JOIN LATERAL (
SELECT total_ct, array_agg(app_id) AS apps
FROM (
SELECT app_id, total_ct
FROM cte c
WHERE c.earliest_review >= d.review_window_start
ORDER BY total_ct DESC
FETCH FIRST 1 ROWS WITH TIES -- new & hot
) sub
GROUP BY 1
) a ON true;
有领带
使它便宜一些.已在Postgres 13(当前为beta)中添加.参见:
WITH TIES
makes it a bit cheaper. Added in Postgres 13 (currently beta). See:
WITH cte AS ( -- MATERIALIZED
SELECT app_id, min(review_date) AS earliest_review, count(*)::int AS total_ct
FROM reviews
GROUP BY 1
)
SELECT *
FROM (
SELECT generate_series(min(review_date)
, max(review_date)
, '1 day')::date
FROM reviews
) d(review_window_start)
LEFT JOIN LATERAL (
SELECT total_ct, array_agg(app_id) AS apps
FROM (
SELECT total_ct, app_id
, rank() OVER (ORDER BY total_ct DESC) AS rnk
FROM cte c
WHERE c.earliest_review >= d.review_window_start
) sub
WHERE rnk = 1
GROUP BY 1
) a ON true;
db<>小提琴此处
与上述相同,但没有有领带
.
我们根本不需要涉及表 apps
.表格 reviews
包含了我们需要的所有信息.
We don't need to involve the table apps
at all. The table reviews
has all information we need.
CTE cte
计算最早的评论&每个应用程序当前的总计数.CTE避免了重复计算.应该会帮助很多.
它总是在Postgres 12之前实现,并且应该在Postgres 12中自动实现,因为它在主查询中使用了很多次.另外,您可以添加关键字 MATERIALIZED
在Postgres 12或更高版本中强制使用.参见:
The CTE cte
computes earliest review & current total count per app. The CTE avoids repeated computation. Should help quite a bit.
It is always materialized before Postgres 12, and should be materialized automatically in Postgres 12 since it is used many times in the main query. Else you could add the keyword MATERIALIZED
in Postgres 12 or later to force it. See:
经过优化的 generate_series()
调用产生从最早到最新审核的那几天.参见:
The optimized generate_series()
call produces the series of days from earliest to latest review. See:
最后,您已经发现的 LEFT JOIN LATERAL
.但是,由于多个应用可以并列以获得最多的评论,因此请检索所有获奖者,可以是0-n个应用.该查询将所有每日获胜者汇总到一个数组中,因此我们每个 review_window_start
都获得一个结果行.另外,定义决胜局最多获得一个一个获胜者.参见:
Finally, the LEFT JOIN LATERAL
you already discovered. But since multiple apps can tie for the most reviews, retrieve all winners, which can be 0 - n apps. The query aggregates all daily winners into an array, so we get a single result row per review_window_start
. Alternatively, define tiebreaker(s) to get at most one winner. See:
这篇关于获得连续几天以来评论次数最高的应用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!