如何做一个忽略这一年的日期数学? [英] How do you do date math that ignores the year?

查看:102
本文介绍了如何做一个忽略这一年的日期数学?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在未来14天内选择有周年纪念日。如何选择不包括年份的日期?我已经尝试过如下的东西。

  SELECT * FROM events 
WHERE EXTRACT(month FROMdate)= 3
AND EXTRACT(day FROMdate)< EXTRACT(从FROM日期开始)+ 14

>
我更喜欢这样做,但我不知道如何忽略一年。

  SELECT * FROM events 
WHERE(date>'2013-03-01'AND date<'2013-04-01')

如何在Postgres中完成这种日期数学?

解决方案

如果你不要在意解释和细节,请使用下面的黑魔法版



迄今为止所呈现的所有查询都是以 not sargable - 它们不能使用索引并且必须计算基表中每一行的表达式找到匹配的行。用小桌子,这并不重要。然而,使用大桌子,这很重要 。



给出以下简单表:

  CREATE TABLE事件(
event_id serial PRIMARY KEY
,event_date date
);



查询



版本1和2 。可以使用以下形式的简单索引:

  CREATE INDEX event_event_date_idx ON事件(event_date); 

但是,如果没有索引,

1。简单版本



  SELECT * 
FROM(
SELECT((current_date + d) - 间隔'1年'* y):: date AS event_date
FROM generate_series(0,14)d
CROSS JOIN generate_series(13,113)y
)x
JOIN事件USING(event_date) ;

子查询 x 计算所有可能的日期从 CROSS JOIN 的两个 generate_series()调用的给定范围。选择是通过简单的等式连接完成的。



2。高级版本



  WITH val AS(
SELECT extract(year FROM age(now():: date + min(event_date))):: int AS max_y
,extract(year FROM age(now():: date,max(event_date))):: int AS min_y
FROM event

SELECT e。* - count(*) -
FROM(
SELECT((current_date + d) - interval'1y'* yy):: date AS event_date
FROM generate_series(0,14)AS d)d
,(SELECT generate_series(min_y,max_y)AS y FROM val)y
)x
JOIN事件e USING(event_date);

自动从表中推出年数,从而最小化生成的年份。

如果您的年龄有差距,您可能会更进一步,并提供现有年份列表。



有效性 - 依赖日期分配。几年中有很多行,使我的解决方案更有用。许多年来,几行都使它不太有用。



简单的SQL小提琴



3。黑魔法版本



稍后我有一个想法,这个全新的方法。

更新02.2016,以删除不必要的生成列从该解决方案将阻止HOT更新,并使用更简单和更快速的功能。



简单的SQL函数可以从模式中计算一个整数'MMDD'

  CREATE FUNCTION f_mmdd(date)RETURNS int LANGUAGE sql IMMUTABLE AS 
$$ SELECT to_char($ 1,'MMDD'):: int $$;

它必须是 IMMUTABLE 用于索引: p>

  CREATE INDEX event_mmdd_event_date_idx ON事件(f_mmdd(event_date),event_date); 

多列索引 - 原因有很多:
可以帮助 ORDER BY 或从给定的年份中选择。请阅读此处。几乎没有额外的成本索引。一个日期适合4个字节,否则将由于数据对齐而丢失到填充。请阅读此处

此外,由于两个索引列都引用相同的表格列,因此没有任何缺点到 HOT 更新。请阅读此处



一个PL / pgSQL表函数统统他们所有的



叉到两个查询之一来覆盖一年的转折。

  CREATE OR REPLACE FUNCTION f_anniversary(date = current_date,int = 14)
RETURNS SETOF事件AS
$ func $
DECLARE
d int:= f_mmdd $ 1);
d1 int = = f_mmdd($ 1 + $ 2 - 1); - 由于包含上限
修正了1的折扣BEGIN
如果d1> d THEN
返回查询
SELECT *
FROM事件e
WHERE f_mmdd(e.event_date)BETWEEN d AND d1
ORDER BY f_mmdd(e.event_date),e 。活动日期;

ELSE - 在年底结束
RETURN QUERY
SELECT *
FROM event e
WHERE f_mmdd(e.event_date)> = d OR
f_mmdd(e.event_date)< = d1
ORDER BY(f_mmdd(e.event_date)> = d)DESC,f_mmdd(e.event_date),event_date;
- 年度转换年龄
END IF;
END
$ func $ LANGUAGE plpgsql;

致电使用默认值:从今天开始14天:

  SELECT * FROM f_anniversary(); 

从2014-08-23开始致电7天:

  SELECT * FROM f_anniversary('2014-08-23':: date,7); 

SQL Fiddle 比较 EXPLAIN ANALYZE



2月29日



在处理周年纪念日或生日时,您需要定义如何处理闰年2月29日的特殊情况。 >

当测试日期范围时,通常会自动包含 Feb 29 ,即使当前年份不是闰年。当天覆盖这一天,时间范围被追溯到1。

另一方面,如果今年是闰年,而你想要找到15天,你可能会得到如果您的数据来自非闰年,结果为闰年14天。



说,鲍勃出生于二月二十九日:

我的查询1.和2.包括2月29日只在闰年。 Bob每年〜4年都有生日。

我的查询3.包括2月29日在范围内。鲍勃每年都有生日。



没有神奇的解决方案。你必须为每个案例定义你想要的。



测试



我用所有提出的解决方案进行了广泛的测试。我将每个查询改为给定的表格,并产生相同的结果,但没有 ORDER BY



:所有这些都是正确,并产生相同的结果 - 除了Gordon的查询语法错误,和@ wildplasser的查询失败,当年包装(易于修复)。



从20世纪的随机日期插入108000行,类似于一个活着的人(13岁或以上)的表。

  INSERT INTO事件(event_date)
SELECT'2000-1-1':: date - (random() * 36525):: int
FROM generate_series(1,108000);

删除〜8%创建一些死元组,使表更现实生活。

  DELETE FROM event WHERE random()< 0.08; 
ANALYZE事件;

我的测试用例有99289行,4012次点击。



C - 猫呼叫



  WITH周年纪念日(
SELECT event_id,event_date
,(event_date +(n ||'years'):: interval):: date anniversary
FROM event,generate_series 13,113)n

SELECT event_id,event_date - count(*) -
从周年纪念
WHERE周年BETWEEN current_date和current_date +间隔'14'



C1 - Catcall的想法重写



Aside从次要优化中,主要区别是仅添加确切的年数 date_trunc('year',age(current_date + 14,event_date))获得今年的周年纪念日,这避免了一个CTE的需要:

  SELECT event_id,event_date 
FROM event
WHERE(event_date + date_trunc('year',age(current_date + 14,event_date))):: date
BETWEEN current_date AND current_date + 14;



D - Daniel



  SELECT *  -  count(*) -  
FROM event
WHERE extract(month FROM age(current_date + 14,event_date))= 0
AND extract(day FROM age(current_date + 14,event_date))< = 14;



E1 - Erwin 1





E2 - Erwin 2



请参见上面的2.高级版。



E3 - Erwin 3



请参见上面的3.黑魔法版本。



G - Gordon



  SELECT *  -  count(*)
FROM(SELECT *,to_char(event_date,'MM-DD')AS mmdd FROM event)e
WHERE to_date to_char(now(),'YYYY')||' - '
||(CASE WHEN mmdd = '02 -29'THEN '02 -28'ELSE mmdd END)
,'YYYY-MM -DD')BETWEEN date(now())和date(now())+ 14;



H - a_horse_with_no_name



  WITH即将推出(
SELECT event_id,event_date
,CASE
WHEN date_trunc('year',age(event_date))= age(event_date)
THEN current_date
ELSE cast(event_date +((extract(year FROM age(event_date))+ 1)
* interval'1'year)
END AS next_event
FROM event

SELECT event_id,event_date
FROM即将到来
WHERE next_event - current_date< = 14;



W - wildplasser



  CREATE OR REPLACE FUNCTION this_years_birthday(_dut date)RETURNS日期AS 
$ func $
DECLARE
ret date;
BEGIN
ret:=
date_trunc('year',current_timestamp)
+(date_trunc('day',_dut)
- date_trunc('year',_dut ));
RETURN ret;
END
$ func $ LANGUAGE plpgsql;

简化为与所有其他人一样返回:

  SELECT * 
FROM event e
WHERE this_years_birthday(e.event_date :: date)
BETWEEN current_date
AND current_date +' 2周



W1 - wildplasser的查询重写



以上的细节受到一些低效的细节影响(超出了这个已经相当可观的帖子的范围)。重写的版本是更快的

 创建或替换功能this_years_birthday(_dut INOUT date)AS 
$ func $
SELECT(date_trunc('year',now())+($ 1 - date_trunc('year',$ 1))):: date
$ func $ LANGUAGE sql;

SELECT *
FROM event e
WHERE this_years_birthday(e.event_date)
BETWEEN current_date
AND(current_date + 14);



测试结果



我运行了这个测试在PostgreSQL 9.1.7上有一个临时表。
结果以 EXPLAIN ANALYZE 收集,最好为5。



结果



 
没有索引
C:总运行时间:76714.723 ms
C1:总运行时间:307.987 ms - !
D:总运行时:325.549 ms
E1:总运行时间: 253.671 ms - !
E2:总运行时间:484.698 ms - min()&max ()昂贵无索引
E3:总运行时: 213.805 ms - !
G:总运行时间:984.788 ms
H:总运行时间:977.297 ms
W:总运行时间:2668.092 ms
W1:总运行时间:596.849 ms - !

索引
E1:总运行时间: 37.939 ms - !!
E2:总运行时间: 38.097 ms - !!

表达式索引
E3:总运行时间: 11.837 ms - !! < b>

所有其他查询与索引执行相同或不相同,因为它们使用不可用表达式。 / p>

Conclusio




  • 到目前为止,丹尼尔的查询是最快的。


  • @wildplassers(改写)方法也可以接受。


  • @ Catcall的版本就像我的相反方法。性能随着更大的桌子快速失控。

    虽然重写的版本效果很好,我使用的表达方式类似于@ wildplassser的 this_years_birthday()函数的简单版本。


  • 我的即使没有索引,简单版本更快,因为需要较少的计算。


  • 使用索引,高级版本与简单版本相同,因为 min() max()成为便宜的索引。两者都远远超过不能使用索引的其他部分。


  • 我的黑魔法版本是最快的,有或没有索引 。而且这是很简单的调用。

    更新版本(基准测试之后)还要快一些。


  • 使用实际生活表,索引将使更大的区别。更多的列使表更大,顺序扫描更昂贵,而索引大小保持不变。



I am trying to select dates that have an anniversary in the next 14 days. How can I select based on dates excluding the year? I have tried something like the following.

SELECT * FROM events
WHERE EXTRACT(month FROM "date") = 3
AND EXTRACT(day FROM "date") < EXTRACT(day FROM "date") + 14

The problem with this is that months wrap.
I would prefer to do something like this, but I don't know how to ignore the year.

SELECT * FROM events
WHERE (date > '2013-03-01' AND date < '2013-04-01')

How can I accomplish this kind of date math in Postgres?

解决方案

If you don't care for explanation and details, use the "Black magic version" below.

All queries presented so far operate with conditions that are not sargable - they cannot use an index and have to compute an expression for every single row in the base table to find matching rows. With small tables, this doesn't matter much. With big tables, however, this matters a lot.

Given the following simple table:

CREATE TABLE event (
  event_id serial PRIMARY KEY
, event_date date
);

Query

Version 1. and 2. can use a simple index of the form:

CREATE INDEX event_event_date_idx ON event(event_date);

But the following solutions are even faster without index.

1. Simple version

SELECT *
FROM  (
   SELECT ((current_date + d) - interval '1 year' * y)::date AS event_date
   FROM       generate_series(0, 14)   d
   CROSS JOIN generate_series(13, 113) y
   ) x
JOIN  event USING (event_date);

Subquery x computes all possible dates over a given range of years from a CROSS JOIN of two generate_series() calls. The selection is done with a simple equi-join.

2. Advanced version

WITH val AS (
   SELECT extract(year FROM age(now()::date + 14, min(event_date)))::int AS max_y
        , extract(year FROM age(now()::date,      max(event_date)))::int AS min_y
   FROM   event
   )
SELECT e.* -- count(*) --
FROM  (
   SELECT ((current_date + d) - interval '1y' * y.y)::date AS event_date
   FROM   generate_series(0, 14) AS d) d
         ,(SELECT generate_series(min_y, max_y) AS y FROM val) y
   ) x
JOIN  event e USING (event_date);

Range of years is deduced from the table automatically - thereby minimizing the generated years.
You could even go one step further and distill a list of existing years if you have gaps in your range of years.

Effectiveness co-depends on the distribution of dates. Few years with many rows each make my solution more useful. Many years with few rows each make it less useful.

Simple SQL Fiddle to play with.

3. Black magic version

Later I had an idea for this radically new approach.
Updated 02.2016 to remove the unnecessary "generated column" from the solution, which would block H.O.T updates, and use a simpler and faster function.

Simple SQL function to calculate an integer from the pattern 'MMDD'.

CREATE FUNCTION f_mmdd(date) RETURNS int LANGUAGE sql IMMUTABLE AS
$$SELECT to_char($1, 'MMDD')::int$$;

It has to be IMMUTABLE to be used in index:

CREATE INDEX event_mmdd_event_date_idx  ON event(f_mmdd(event_date), event_date);

Multi-column index - for a number of reasons: Can help with ORDER BY or with selecting from given years. Read here. At almost no additional cost for the index. A date fits into the 4 bytes that would otherwise be lost to padding due to data alignment. Read here.
Also, since both index columns reference the same table column, no drawback with regard to H.O.T. updates. Read here.

One PL/pgSQL table function to rule them all

Fork to one of two queries to cover the turn of the year.

CREATE OR REPLACE FUNCTION f_anniversary(date = current_date, int = 14)
  RETURNS SETOF event AS
$func$
DECLARE
   d  int := f_mmdd($1);
   d1 int := f_mmdd($1 + $2 - 1);  -- fix off-by-1 due to including upper bound
BEGIN
   IF d1 > d THEN
      RETURN QUERY
      SELECT *
      FROM   event e
      WHERE  f_mmdd(e.event_date) BETWEEN d AND d1
      ORDER  BY f_mmdd(e.event_date), e.event_date;

   ELSE  -- wrap around end of year
      RETURN QUERY
      SELECT *
      FROM   event e
      WHERE  f_mmdd(e.event_date) >= d OR
             f_mmdd(e.event_date) <= d1
      ORDER  BY (f_mmdd(e.event_date) >= d) DESC, f_mmdd(e.event_date), event_date;
      -- chronological across turn of the year
   END IF;
END
$func$  LANGUAGE plpgsql ;

Call using defaults: 14 days beginning "today":

SELECT * FROM f_anniversary();

Call for 7 days beginning '2014-08-23':

SELECT * FROM f_anniversary('2014-08-23'::date, 7);

SQL Fiddle comparing EXPLAIN ANALYZE.

February 29

When dealing with anniversaries or "birthdays", you need to define how to deal with the special case February 29 in leap years.

When testing for ranges of dates, Feb 29 is usually included automatically, even if the current year is not a leap year. The range of days is extended by 1 retroactively when it covers this day.
On the other hand, if the current year is a leap year, and you want to look for 15 days, you may end up getting results for 14 days in leap years if your data is from non-leap years.

Say, Bob is born on the 29th of February:
My query 1. and 2. include February 29 only in leap years. Bob has birthday only every ~ 4 years.
My query 3. includes February 29 in the range. Bob has birthday every year.

There is no magical solution. You have to define what you want for every case.

Test

To substantiate my point I ran an extensive test with all the presented solutions. I adapted each of the queries to the given table and to yield identical results without ORDER BY.

The good news: all of them are correct and yield the same result - except for Gordon's query that had syntax errors, and @wildplasser's query that fails when the year wraps around (easy to fix).

Insert 108000 rows with random dates from the 20th century, which is similar to a table of living people (13 or older).

INSERT INTO  event (event_date)
SELECT '2000-1-1'::date - (random() * 36525)::int
FROM   generate_series (1, 108000);

Delete ~ 8 % to create some dead tuples and make the table more "real life".

DELETE FROM event WHERE random() < 0.08;
ANALYZE event;

My test case had 99289 rows, 4012 hits.

C - Catcall

WITH anniversaries as (
   SELECT event_id, event_date
         ,(event_date + (n || ' years')::interval)::date anniversary
   FROM   event, generate_series(13, 113) n
   )
SELECT event_id, event_date -- count(*)   --
FROM   anniversaries
WHERE  anniversary BETWEEN current_date AND current_date + interval '14' day;

C1 - Catcall's idea rewritten

Aside from minor optimizations, the major difference is to add only the exact amount of years date_trunc('year', age(current_date + 14, event_date)) to get this year's anniversary, which avoids the need for a CTE altogether:

SELECT event_id, event_date
FROM   event
WHERE (event_date + date_trunc('year', age(current_date + 14, event_date)))::date
       BETWEEN current_date AND current_date + 14;

D - Daniel

SELECT *   -- count(*)   -- 
FROM   event
WHERE  extract(month FROM age(current_date + 14, event_date))  = 0
AND    extract(day   FROM age(current_date + 14, event_date)) <= 14;

E1 - Erwin 1

See "1. Simple version" above.

E2 - Erwin 2

See "2. Advanced version" above.

E3 - Erwin 3

See "3. Black magic version" above.

G - Gordon

SELECT * -- count(*)   
FROM  (SELECT *, to_char(event_date, 'MM-DD') AS mmdd FROM event) e
WHERE  to_date(to_char(now(), 'YYYY') || '-'
                 || (CASE WHEN mmdd = '02-29' THEN '02-28' ELSE mmdd END)
              ,'YYYY-MM-DD') BETWEEN date(now()) and date(now()) + 14;

H - a_horse_with_no_name

WITH upcoming as (
   SELECT event_id, event_date
         ,CASE 
            WHEN date_trunc('year', age(event_date)) = age(event_date)
                 THEN current_date
            ELSE cast(event_date + ((extract(year FROM age(event_date)) + 1)
                      * interval '1' year) AS date) 
          END AS next_event
   FROM event
   )
SELECT event_id, event_date
FROM   upcoming
WHERE  next_event - current_date  <= 14;

W - wildplasser

CREATE OR REPLACE FUNCTION this_years_birthday(_dut date) RETURNS date AS
$func$
DECLARE
    ret date;
BEGIN
    ret :=
    date_trunc( 'year' , current_timestamp)
        + (date_trunc( 'day' , _dut)
         - date_trunc( 'year' , _dut));
    RETURN ret;
END
$func$ LANGUAGE plpgsql;

Simplified to return the same as all the others:

SELECT *
FROM   event e
WHERE  this_years_birthday( e.event_date::date )
        BETWEEN current_date
        AND     current_date + '2weeks'::interval;

W1 - wildplasser's query rewritten

The above suffers from a number of inefficient details (beyond the scope of this already sizable post). The rewritten version is much faster:

CREATE OR REPLACE FUNCTION this_years_birthday(_dut INOUT date) AS
$func$
SELECT (date_trunc('year', now()) + ($1 - date_trunc('year', $1)))::date
$func$ LANGUAGE sql;

SELECT *
FROM   event e
WHERE  this_years_birthday(e.event_date)
        BETWEEN current_date
        AND    (current_date + 14);

Test results

I ran this test with a temporary table on PostgreSQL 9.1.7. Results were gathered with EXPLAIN ANALYZE, best of 5.

Results

Without index
C:  Total runtime: 76714.723 ms
C1: Total runtime: 307.987 ms   -- !
D:  Total runtime: 325.549 ms
E1: Total runtime: 253.671 ms  -- !
E2: Total runtime: 484.698 ms   -- min() & max() expensive without index
E3: Total runtime: 213.805 ms  -- !
G:  Total runtime: 984.788 ms
H:  Total runtime: 977.297 ms
W:  Total runtime: 2668.092 ms
W1: Total runtime: 596.849 ms   -- !

With index
E1: Total runtime: 37.939 ms   --!!
E2: Total runtime: 38.097 ms   --!!

With index on expression
E3: Total runtime: 11.837 ms   --!!

All other queries perform the same with or without index because they use non-sargable expressions.

Conclusio

  • So far, @Daniel's query was the fastest.

  • @wildplassers (rewritten) approach performs acceptably, too.

  • @Catcall's version is something like the reverse approach of mine. Performance gets out of hand quickly with bigger tables.
    The rewritten version performs pretty well, though. The expression I use is something like a simpler version of @wildplassser's this_years_birthday() function.

  • My "simple version" is faster even without index, because it needs fewer computations.

  • With index, the "advanced version" is about as fast as the "simple version", because min() and max() become very cheap with an index. Both are substantially faster than the rest which cannot use the index.

  • My "black magic version" is fastest with or without index. And it is very simple to call.
    The updated version (after the benchmark) is a bit faster, yet.

  • With a real life table an index will make even greater difference. More columns make the table bigger, and sequential scan more expensive, while the index size stays the same.

这篇关于如何做一个忽略这一年的日期数学?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆