在每个用户的最后一条记录之前的一个月内选择数据 [英] Select data within one month prior to each user's last record
问题描述
假设我有一个名为"Diary"的表,如下所示:
Assume I have a table called "Diary" like this:
| id | user_id | recorded_at | record |
|----|---------|--------------------------|--------|
| 20 | 50245 |2017-10-01 23:00:14.765366| 89 |
| 21 | 50245 |2017-12-05 10:00:33.135331| 97 |
| 22 | 50245 |2017-12-31 11:50:23.965134| 80 |
| 23 | 76766 |2015-10-06 11:00:14.902452| 70 |
| 24 | 76766 |2015-10-07 22:40:59.124553| 81 |
对于每个用户,我想检索最近的行以及在此之前一个月内的所有行.
换句话说,对于user_id 50245,我希望他/她的数据从"2017-12-01 11:50:23.965134"到"2017-12-31 11:50:23.965134";对于user_id 76766,我希望他/她的数据从"2015-09-07 22:40:59.124553"到"2015-10-07 22:40:59.124553".
In other words, for user_id 50245, I want the his/her data from "2017-12-01 11:50:23.965134" to "2017-12-31 11:50:23.965134"; for user_id 76766, I want his/her data from "2015-09-07 22:40:59.124553" to "2015-10-07 22:40:59.124553".
因此,所需的结果如下所示:
Hence the desired result looks like this:
| id | user_id | recorded_at | record |
|----|---------|--------------------------|--------|
| 21 | 50245 |2017-12-05 10:00:33.135331| 97 |
| 22 | 50245 |2017-12-31 11:50:23.965134| 80 |
| 23 | 76766 |2015-10-06 11:00:14.902452| 70 |
| 24 | 76766 |2015-10-07 22:40:59.124553| 81 |
请注意,id 20的记录不包含在 中,因为它比user_id 50245的最后一条记录早了一个月.
Please note that the record of id 20 is not included because it is more than one month prior to user_id 50245's last record.
有什么办法可以编写SQL查询来实现这一目标?
Is there any way I can write an SQL query to achieve this?
推荐答案
对于小型表,任何(有效的)查询技术都是好的.
For small tables, any (valid) query technique is good.
对于大表,细节很重要.假设:
For big tables, details matter. Assuming:
-
还有一个
users
表,其中user_id
作为PK,包含所有相关用户(或可能还有更多).这是典型的设置.
There is also a
users
table withuser_id
as PK containing all relevant users (or possibly a few more). This is the typical setup.
您具有(或可以创建)日记的索引(user_id,record_at DESC NULLS LAST)
.如果在 NOT NULL
中定义了 recorded_at
,则 NULLS LAST
是可选的.但是请确保查询与索引匹配.
You have (or can create) an index on diary (user_id, recorded_at DESC NULLS LAST)
. NULLS LAST
is optional if recorded_at
is defined NOT NULL
. But make sure the query matches the index.
每个用户多于几行-典型的用例.
More than a few rows per user - the typical use case.
这应该是最快的选择之一:
This should be among the fastest options:
SELECT d.*
FROM users u
CROSS JOIN LATERAL (
SELECT recorded_at
FROM diary
WHERE user_id = u.user_id
ORDER BY recorded_at DESC NULLS LAST
LIMIT 1
) d1
JOIN diary d ON d.user_id = u.user_id
AND d.recorded_at >= d1.recorded_at - interval '1 month'
ORDER BY d.user_id, d.recorded_at;
准确地产生您想要的结果.
Produces your desired result exactly.
每个用户仅几行,子查询中的 max()
或 DISTINCT ON()
通常更快.
For only few rows per user, max()
or DISTINCT ON ()
in a subquery are typically faster.
相关(带有详细说明):
Related (with detailed explanation):
关于 FROM
子句:
这篇关于在每个用户的最后一条记录之前的一个月内选择数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!