每天如何选择1条以上的记录? [英] How to select more than 1 record per day?
问题描述
这是一个 postgresql
问题。
PostgreSQL 8.3.3 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-9).
表格如下:
date_time other_column
2012-11-01 00:00:00 ...
2012-11-02 01:00:00 ...
2012-11-02 02:00:00 ...
2012-11-02 03:00:00 ...
2012-11-02 04:00:00 ...
2012-11-03 05:00:00 ...
2012-11-03 06:00:00 ...
2012-11-05 00:00:00 ...
2012-11-07 00:00:00 ...
2012-11-07 00:00:00 ...
...
我想从特定日期范围内选择 每天最多3条记录 。
I want to select at most 3 records per day from a specific date range.
例如,我想从2012-11-02到2012-11-05最多选择3条记录。
预期结果
将是:
For example, I want to select at most 3 records from 2012-11-02 to 2012-11-05.
The expected result
would be:
date_time other_column
2012-11-02 01:00:00 ...
2012-11-02 02:00:00 ...
2012-11-02 03:00:00 ...
2012-11-03 05:00:00 ...
2012-11-03 06:00:00 ...
2012-11-05 00:00:00 ...
我已经花了几个小时,仍然无法弄清楚。请帮我。 :(
I have spent a few hours on this and still cannot figure it out. Please help me. :(
更新:
我尝试使用的当前SQL每天只能选择一条记录:
UPDATE: The current sql I tried could only select one record per day:
SELECT DISTINCT ON (TO_DATE(SUBSTRING((date_time || '') FROM 1 FOR 10), 'YYYY-MM-DD')) *
FROM myTable
WHERE date_time >= '20121101 00:00:00'
AND date_time <= '20121130 23:59:59'
推荐答案
我想每个选择最多 3条记录
SELECT date_time, other_column
FROM (
SELECT *, row_number() OVER (PARTITION BY date_time::date) AS rn
FROM tbl
WHERE date_time >= '2012-11-01 0:0'
AND date_time < '2012-12-01 0:0'
) x
WHERE rn < 4;
要点
-
使用窗口功能
row_number()
。根据问题,rank()
或dense_rank()
是错误的-可能选择了超过3条带有时间戳的记录Major points
Use the window function
row_number()
.rank()
ordense_rank()
would be wrong according to the question - more than 3 records might be selected with timestamp duplicates.由于您没有定义每天想要的 行,正确的答案是不包含<$ c窗口函数中的$ c> ORDER BY 子句。
Since you do not define which rows you want per day, the correct answer is not to include an
ORDER BY
clause in the window function. Gives you an arbitrary selection, which matches the question.我从更改了
WHERE
子句p>
I changed your
WHERE
clause fromWHERE date_time >= '20121101 00:00:00' AND date_time <= '20121130 23:59:59'
至
WHERE date_time >= '2012-11-01 0:0' AND date_time < '2012-12-01 0:0'
对于诸如
'20121130 23:59:59.123'
。@Craig建议:
date_time::date BETWEEN '2012-11-02' AND '2012-11-05'
..可以正常工作,但是在性能方面是反模式。如果对表达式中的数据库列应用强制类型转换或函数,则不能使用普通索引。
.. would work correctly, but is an anti-pattern regarding performance. If you apply a cast or a function to your database column in the expression, plain indexes cannot be used.
最佳解决方案:升级到最新版本,最好是最新版本9.2。
其他解决方案:
仅几天,您可以雇用
UNION ALL
:SELECT date_time, other_column FROM tbl t1 WHERE date_time >= '2012-11-01 0:0' AND date_time < '2012-11-02 0:0' LIMIT 3 ) UNION ALL ( SELECT date_time, other_column FROM tbl t1 WHERE date_time >= '2012-11-02 0:0' AND date_time < '2012-11-03 0:0' LIMIT 3 ) ...
在这里,括号不是可选的。
Parenthesis are not optional here.
几天,存在
generate_series()
的解决方法-类似于我发布的 此处(包括指向更多链接)。For more days there are workarounds with
generate_series()
- something like I posted here (including a link to more).I早在我们拥有窗口功能之前,就可以使用 plpgsql函数解决它了:
I might have solved it with a plpgsql function back in the old days before we had window functions:
CREATE OR REPLACE FUNCTION x.f_foo (date, date, integer , OUT date_time timestamp, OUT other_column text) RETURNS SETOF record AS $BODY$ DECLARE _last_day date; -- remember last day _ct integer := 1; -- count BEGIN FOR date_time, other_column IN SELECT t.date_time, t.other_column FROM tbl t WHERE t.date_time >= $1::timestamp AND t.date_time < ($2 + 1)::timestamp ORDER BY t.date_time::date LOOP IF date_time::date = _last_day THEN _ct := _ct + 1; ELSE _ct := 1; END IF; IF _ct <= $3 THEN RETURN NEXT; END IF; _last_day := date_time::date; END LOOP; END; $BODY$ LANGUAGE plpgsql STABLE STRICT; COMMENT ON FUNCTION f_foo(date3, date, integer) IS 'Return n rows per day $1 .. date_from (incl.) $2 .. date_to (incl.) $3 .. maximim rows per day';
致电:
SELECT * FROM f_foo('2012-11-01', '2012-11-05', 3);
这篇关于每天如何选择1条以上的记录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!