每天如何选择1条以上的记录? [英] How to select more than 1 record per day?

查看:104
本文介绍了每天如何选择1条以上的记录?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个 postgresql 问题。

PostgreSQL 8.3.3 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-9).

表格如下:

date_time           other_column
2012-11-01 00:00:00 ...
2012-11-02 01:00:00 ...
2012-11-02 02:00:00 ...
2012-11-02 03:00:00 ...
2012-11-02 04:00:00 ...
2012-11-03 05:00:00 ...
2012-11-03 06:00:00 ...
2012-11-05 00:00:00 ...
2012-11-07 00:00:00 ...
2012-11-07 00:00:00 ...
...

我想从特定日期范围内选择 每天最多3条记录

I want to select at most 3 records per day from a specific date range.

例如,我想从2012-11-02到2012-11-05最多选择3条记录。
预期结果将是:

For example, I want to select at most 3 records from 2012-11-02 to 2012-11-05. The expected result would be:

date_time           other_column
2012-11-02 01:00:00 ...
2012-11-02 02:00:00 ...
2012-11-02 03:00:00 ...
2012-11-03 05:00:00 ...
2012-11-03 06:00:00 ...
2012-11-05 00:00:00 ...

我已经花了几个小时,仍然无法弄清楚。请帮我。 :(

I have spent a few hours on this and still cannot figure it out. Please help me. :(

更新:
我尝试使用的当前SQL每天只能选择一条记录:

UPDATE: The current sql I tried could only select one record per day:

SELECT DISTINCT ON (TO_DATE(SUBSTRING((date_time || '') FROM 1 FOR 10), 'YYYY-MM-DD')) *
FROM myTable
WHERE  date_time >=  '20121101 00:00:00'  
AND  date_time <= '20121130 23:59:59'


推荐答案


我想每个选择最多 3条记录



SELECT date_time, other_column
FROM  (
   SELECT *, row_number() OVER (PARTITION BY date_time::date) AS rn
   FROM   tbl
   WHERE  date_time >= '2012-11-01 0:0'
   AND    date_time <  '2012-12-01 0:0'
   ) x
WHERE  rn < 4;



要点




  • 使用窗口功能 row_number() 。根据问题, rank() dense_rank()是错误的-可能选择了超过3条带有时间戳的记录

    Major points

    • Use the window function row_number(). rank() or dense_rank() would be wrong according to the question - more than 3 records might be selected with timestamp duplicates.

      由于您没有定义每天想要的 行,正确的答案是不包含<$ c窗口函数中的$ c> ORDER BY 子句。

      Since you do not define which rows you want per day, the correct answer is not to include an ORDER BY clause in the window function. Gives you an arbitrary selection, which matches the question.

      我从 WHERE 子句p>

      I changed your WHERE clause from

      WHERE  date_time >= '20121101 00:00:00'  
      AND    date_time <= '20121130 23:59:59'
      

      WHERE  date_time >=  '2012-11-01 0:0'  
      AND    date_time <   '2012-12-01 0:0'
      

      对于诸如'20121130 23:59:59.123'

      @Craig建议:

      date_time::date BETWEEN '2012-11-02' AND '2012-11-05'
      

      ..可以正常工作,但是在性能方面是反模式。如果对表达式中的数据库列应用强制类型转换或函数,则不能使用普通索引。

      .. would work correctly, but is an anti-pattern regarding performance. If you apply a cast or a function to your database column in the expression, plain indexes cannot be used.

      最佳解决方案升级到最新版本,最好是最新版本9.2。

      其他解决方案

      几天,您可以雇用 UNION ALL

      SELECT date_time, other_column
      FROM   tbl t1
      WHERE  date_time >= '2012-11-01 0:0'
      AND    date_time <  '2012-11-02 0:0'
      LIMIT  3
      )
      UNION ALL 
      (
      SELECT date_time, other_column
      FROM   tbl t1
      WHERE  date_time >= '2012-11-02 0:0'
      AND    date_time <  '2012-11-03 0:0'
      LIMIT  3
      )
      ...
      

      在这里,括号不是可选的。

      Parenthesis are not optional here.

      几天,存在 generate_series() 的解决方法-类似于我发布的 此处(包括指向更多链接)

      For more days there are workarounds with generate_series() - something like I posted here (including a link to more).

      I早在我们拥有窗口功能之前,就可以使用 plpgsql函数解决它了:

      I might have solved it with a plpgsql function back in the old days before we had window functions:

      CREATE OR REPLACE FUNCTION x.f_foo (date, date, integer
                               , OUT date_time timestamp, OUT other_column text)
        RETURNS SETOF record AS
      $BODY$
      DECLARE
          _last_day date;          -- remember last day
          _ct       integer := 1;  -- count
      BEGIN
      
      FOR date_time, other_column IN
         SELECT t.date_time, t.other_column
         FROM   tbl t
         WHERE  t.date_time >= $1::timestamp
         AND    t.date_time <  ($2 + 1)::timestamp
         ORDER  BY t.date_time::date
      LOOP
         IF date_time::date = _last_day THEN
            _ct := _ct + 1;
         ELSE
            _ct := 1;
         END IF;
      
         IF _ct <= $3 THEN
            RETURN NEXT;
         END IF;
      
         _last_day := date_time::date;
      END LOOP;
      
      END;
      $BODY$ LANGUAGE plpgsql STABLE STRICT;
      
      COMMENT ON FUNCTION f_foo(date3, date, integer) IS 'Return n rows per day
      $1 .. date_from (incl.)
      $2 .. date_to  (incl.)
      $3 .. maximim rows per day';
      

      致电:

      SELECT * FROM f_foo('2012-11-01', '2012-11-05', 3);
      

      这篇关于每天如何选择1条以上的记录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆