你如何计算忽略年份的日期数学? [英] How do you do date math that ignores the year?

查看:26
本文介绍了你如何计算忽略年份的日期数学?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试选择在接下来的 14 天内有周年纪念日的日期.如何根据不包括年份的日期进行选择?我尝试过类似以下的方法.

SELECT * FROM 事件WHERE EXTRACT(从日期"开始的月份)= 3AND EXTRACT(day FROM "date") <提取(从日期"开始的天数)+ 14

这个问题是月份换行.
我宁愿做这样的事情,但我不知道如何忽略年份.

SELECT * FROM 事件WHERE (日期 > '2013-03-01' AND 日期 < '2013-04-01')

如何在 Postgres 中完成这种日期计算?

解决方案

如果你不关心解释和细节,请使用下面的黑魔法版".

到目前为止,其他答案中提出的所有查询都使用不可sargable的条件 - 他们不能使用索引并且必须为基表中的每一行计算一个表达式来找到匹配的行.与小桌子无关.大桌子很重要(很多).

给定以下简单表格:

CREATE TABLE 事件(event_id 串行主键, event_date 日期);

查询

以下版本 1. 和 2. 可以使用以下形式的简单索引:

CREATE INDEX event_event_date_idx ON event(event_date);

但以下所有解决方案在没有索引的情况下速度更快.

1.简单版

SELECT *从  (SELECT ((current_date + d) - 间隔'1年' * y)::date AS event_dateFROM generate_series( 0, 14) d交叉连接 generate_series(13, 113) y) XJOIN 事件使用 (event_date);

Subquery x 从两个 generate_series() 调用的 CROSS JOIN 计算给定年份范围内的所有可能日期.选择是通过最后的简单连接完成的.

2.进阶版

WITH val AS (SELECT 提取(从年龄(current_date + 14,min(event_date))):: int AS max_y,提取(从年龄(current_date,max(event_date)))::int AS min_yFROM 事件)选择 e.*从  (SELECT ((current_date + d.d) - 间隔 '1 年' * y.y)::date AS event_dateFROM generate_series(0, 14) d,(SELECT generate_series(min_y, max_y) AS y FROM val) y) XJOIN event e USING (event_date);

从表中自动推导出年份范围 - 从而最大限度地减少生成的年份.
如果存在差距,您可以更进一步,提取现有年份的列表.

有效性共同取决于日期的分布.几年中每行都有很多行,这使这个解决方案更有用.多年来,每一行都很少,所以它的用处不大.

简单的 SQL Fiddle 可以玩.

3.黑魔法版

更新 2016 以删除生成的列",这会阻止 H.O.T.更新;更简单、更快的功能.
更新了 2018 以使用 IMMUTABLE 表达式计算 MMDD 以允许函数内联.

创建一个简单的 SQL 函数来根据模式 'MMDD' 计算 integer:

CREATE FUNCTION f_mmdd(date) RETURNS int LANGUAGE sql IMMUTABLE AS'SELECT (EXTRACT(month FROM $1) * 100 + EXTRACT(day FROM $1))::int';

我一开始有 to_char(time, 'MMDD'),但切换到上面的表达式,这在 Postgres 9.6 和 10 的新测试中被证明是最快的:

db<>fiddle 这里

它允许 函数内联 因为 EXTRACT (xyz FROM date)在内部使用 IMMUTABLE 函数 date_part(text, date) 实现.它必须是 IMMUTABLE 允许其在以下基本多列表达式索引中使用:

CREATE INDEX event_mmdd_event_date_idx ON event(f_mmdd(event_date), event_date);

多列 出于多种原因:
可以帮助 ORDER BY 或从给定年份中进行选择.在此处阅读.指数几乎没有额外成本.date 适合 4 个字节,否则会因数据对齐而丢失填充.在此处阅读.
此外,由于两个索引列都引用相同的表列,因此 H.O.T. 更新没有缺点.在此处阅读.

一个PL/pgSQL表函数来统治它们

分叉到两个查询之一以涵盖年末:

创建或替换函数 f_anniversary(date = current_date, int = 14)返回 SETOF 事件 AS$func$宣布d int := f_mmdd($1);d1 int := f_mmdd($1 + $2 - 1);-- 从上限修正 off-by-1开始如果 d1 >d 那么返回查询选择 *FROM 事件 eWHERE f_mmdd(e.event_date) BETWEEN d 和 d1ORDER BY f_mmdd(e.event_date), e.event_date;ELSE——年末结束返回查询选择 *FROM 事件 eWHERE f_mmdd(e.event_date) >= d 或f_mmdd(e.event_date) <= d1ORDER BY (f_mmdd(e.event_date) >= d) DESC, f_mmdd(e.event_date), event_date;-- 跨年的时间顺序万一;结尾$func$ LANGUAGE plpgsql;

调用使用默认值:从今天"开始的 14 天:

SELECT * FROM f_anniversary();

从2014-08-23"开始的 7 天通话:

SELECT * FROM f_anniversary(date '2014-08-23', 7);

SQL Fiddle 比较 EXPLAIN分析.

2 月 29 日

在处理纪念日或生日"时,需要定义如何处理闰年特殊情况2月29日".

测试日期范围时,通常会自动包含 Feb 29,即使当前年份不是闰年.涵盖这一天时,天数范围将追溯延长 1.
另一方面,如果当前年份是闰年,并且您要查找 15 天,如果您的数据来自非闰年,则可能最终得到闰年 14 天的结果.

比如说,鲍勃出生于 2 月 29 日:
我的查询 1. 和 2. 仅在闰年包括 2 月 29 日.鲍勃每 ~ 4 年才有一次生日.
我的查询 3. 包括 2 月 29 日的范围.鲍勃每年过生日.

没有神奇的解决方案.您必须为每个案例定义您想要的内容.

测试

为了证实我的观点,我对所有提供的解决方案进行了广泛的测试.我将每个查询调整到给定的表,并在没有 ORDER BY 的情况下产生相同的结果.

好消息:所有这些都正确并产生相同的结果 - 除了 Gordon 的查询有语法错误,以及@wildplasser 的查询在年份结束时失败(易于修复).

插入 108000 行,其中包含 20 世纪的随机日期,这类似于在世人(13 岁或以上)的表格.

INSERT INTO 事件(事件日期)SELECT '2000-1-1'::date - (random() * 36525)::intFROM generate_series (1, 108000);

删除 ~ 8 % 以创建一些死元组并使表格更真实".

DELETE FROM event WHERE random() <0.08;分析事件;

我的测试用例有 99289 行,4012 次点击.

C - Catcall

周年纪念为 (选择事件 ID,事件日期,(event_date + (n || 'years')::interval)::date 周年FROM 事件, generate_series(13, 113) n)SELECT event_id, event_date -- 计数(*) --来自周年纪念日WHERE 周年纪念日 BETWEEN current_date AND current_date + 间隔14"天;

C1 - Catcall 的想法被重写

除了小的优化之外,主要的区别是只添加确切的年数 date_trunc('year', age(current_date + 14, event_date))获得今年的周年纪念日,这完全避免了 CTE:

SELECT event_id, event_dateFROM 事件WHERE (event_date + date_trunc('year', age(current_date + 14, event_date)))::date在 current_date 和 current_date + 14 之间;

D - 丹尼尔

SELECT * -- 计数(*) --FROM 事件哪里提取(从年龄(当前日期 + 14,事件日期))= 0AND 提取(day FROM age(current_date + 14, event_date)) <= 14;

E1 - 埃尔文 1

参见上面的1. 简单版本".

E2 - 埃尔文 2

请参阅上面的2. 高级版".

E3 - 欧文 3

参见上面的3. 黑魔法版本".

G - 戈登

SELECT * -- 计数(*)FROM (SELECT *, to_char(event_date, 'MM-DD') AS mmdd FROM event) eWHERE to_date(to_char(now(), 'YYYY') || '-'||(案例当 mmdd = '02-29' THEN '02-28' ELSE mmdd END),'YYYY-MM-DD') BETWEEN date(now()) 和 date(now()) + 14;

H - a_horse_with_no_name

即将作为(选择事件 ID,事件日期,案件WHEN date_trunc('year', age(event_date)) = age(event_date)THEN current_dateELSE cast(event_date + ((extract(year FROM age(event_date)) + 1)* 间隔 '1' 年)AS 日期)END AS next_eventFROM 事件)选择事件 ID,事件日期来自即将到来的WHERE next_event - current_date <= 14;

W - wildplasser

CREATE OR REPLACE FUNCTION this_years_birthday(_dut date) RETURNS date AS$func$宣布返回日期;开始回复 :=date_trunc('年', current_timestamp)+ (date_trunc('天', _dut)- date_trunc('年', _dut));返回 ret;结尾$func$ LANGUAGE plpgsql;

简化为与所有其他返回相同:

SELECT *FROM 事件 e哪里 this_years_birthday( e.event_date::date )当前日期之间AND current_date + '2weeks'::interval;

W1 - 重写了wildplasser 的查询

上述内容存在许多低效的细节(超出了这篇已经相当大的帖子的范围).重写版本要快得多:

CREATE OR REPLACE FUNCTION this_years_birthday(_dut INOUT date) AS$func$SELECT (date_trunc('year', now()) + ($1 - date_trunc('year', $1)))::date$func$ LANGUAGE sql;选择 *FROM 事件 eWHERE this_years_birthday(e.event_date)当前日期之间AND (current_date + 14);

测试结果

我在 PostgreSQL 9.1.7 上用一个临时表运行了这个测试.使用 EXPLAIN ANALYZE 收集结果,最好的 5.

结果

<前>无索引C:总运行时间:76714.723 毫秒C1:总运行时间:307.987 毫秒——!D:总运行时间:325.549 毫秒E1:总运行时间:253.671 毫秒 -- !E2:总运行时间:484.698 ms -- min() & max() 没有索引很昂贵E3:总运行时间:213.805 毫秒 -- !G:总运行时间:984.788 毫秒H:总运行时间:977.297 毫秒W:总运行时间:2668.092 毫秒W1:总运行时间:596.849 毫秒——!带索引E1:总运行时间:37.939 毫秒 --!!E2:总运行时间:38.097 毫秒 --!!在表达式上带有索引E3:总运行时间:11.837 毫秒 --!!

所有其他查询在带或不带索引的情况下都执行相同的操作,因为它们使用non-sargable 表达式.

结论

  • 到目前为止,@Daniel 的查询是最快的.

  • @wildplassers(重写)方法的表现也可以接受.

  • @Catcall 的版本有点像我的反向方法.使用更大的桌子,性能很快就会失控.
    不过,重写版本的表现相当不错.我使用的表达式类似于@wildplassser 的 this_years_birthday() 函数的更简单版本.

  • 我的简单版本"更快即使没有索引,因为它需要更少的计算.

  • 有了索引,高级版"和简单版"差不多,因为min()max()变成了 非常 便宜的索引.两者都比其他不能使用索引的要快得多.

  • 我的黑魔法版本"无论有无索引都是最快的.而且调用起来非常.
    更新版本(在基准测试之后)要快一些.

  • 对于现实生活中的表格,索引将使更大不同.列越多,表越大,顺序扫描的开销越大,而索引大小保持不变.

I am trying to select dates that have an anniversary in the next 14 days. How can I select based on dates excluding the year? I have tried something like the following.

SELECT * FROM events
WHERE EXTRACT(month FROM "date") = 3
AND EXTRACT(day FROM "date") < EXTRACT(day FROM "date") + 14

The problem with this is that months wrap.
I would prefer to do something like this, but I don't know how to ignore the year.

SELECT * FROM events
WHERE (date > '2013-03-01' AND date < '2013-04-01')

How can I accomplish this kind of date math in Postgres?

解决方案

If you don't care for explanation and details, use the "Black magic version" below.

All queries presented in other answers so far operate with conditions that are not sargable - they cannot use an index and have to compute an expression for every single row in the base table to find matching rows. Doesn't matter much with small tables. Matters (a lot) with big tables.

Given the following simple table:

CREATE TABLE event (
  event_id   serial PRIMARY KEY
, event_date date
);

Query

Version 1. and 2. below can use a simple index of the form:

CREATE INDEX event_event_date_idx ON event(event_date);

But all of the following solutions are even faster without index.

1. Simple version

SELECT *
FROM  (
   SELECT ((current_date + d) - interval '1 year' * y)::date AS event_date
   FROM       generate_series( 0,  14) d
   CROSS JOIN generate_series(13, 113) y
   ) x
JOIN  event USING (event_date);

Subquery x computes all possible dates over a given range of years from a CROSS JOIN of two generate_series() calls. The selection is done with the final simple join.

2. Advanced version

WITH val AS (
   SELECT extract(year FROM age(current_date + 14, min(event_date)))::int AS max_y
        , extract(year FROM age(current_date,      max(event_date)))::int AS min_y
   FROM   event
   )
SELECT e.*
FROM  (
   SELECT ((current_date + d.d) - interval '1 year' * y.y)::date AS event_date
   FROM   generate_series(0, 14) d
        ,(SELECT generate_series(min_y, max_y) AS y FROM val) y
   ) x
JOIN  event e USING (event_date);

Range of years is deduced from the table automatically - thereby minimizing generated years.
You could go one step further and distill a list of existing years if there are gaps.

Effectiveness co-depends on the distribution of dates. Few years with many rows each make this solution more useful. Many years with few rows each make it less useful.

Simple SQL Fiddle to play with.

3. Black magic version

Updated 2016 to remove a "generated column", which would block H.O.T. updates; simpler and faster function.
Updated 2018 to calculate MMDD with IMMUTABLE expressions to allow function inlining.

Create a simple SQL function to calculate an integer from the pattern 'MMDD':

CREATE FUNCTION f_mmdd(date) RETURNS int LANGUAGE sql IMMUTABLE AS
'SELECT (EXTRACT(month FROM $1) * 100 + EXTRACT(day FROM $1))::int';

I had to_char(time, 'MMDD') at first, but switched to the above expression which proved fastest in new tests on Postgres 9.6 and 10:

db<>fiddle here

It allows function inlining because EXTRACT (xyz FROM date) is implemented with the IMMUTABLE function date_part(text, date) internally. And it has to be IMMUTABLE to allow its use in the following essential multicolumn expression index:

CREATE INDEX event_mmdd_event_date_idx ON event(f_mmdd(event_date), event_date);

Multicolumn for a number of reasons:
Can help with ORDER BY or with selecting from given years. Read here. At almost no additional cost for the index. A date fits into the 4 bytes that would otherwise be lost to padding due to data alignment. Read here.
Also, since both index columns reference the same table column, no drawback with regard to H.O.T. updates. Read here.

One PL/pgSQL table function to rule them all

Fork to one of two queries to cover the turn of the year:

CREATE OR REPLACE FUNCTION f_anniversary(date = current_date, int = 14)
  RETURNS SETOF event AS
$func$
DECLARE
   d  int := f_mmdd($1);
   d1 int := f_mmdd($1 + $2 - 1);  -- fix off-by-1 from upper bound
BEGIN
   IF d1 > d THEN
      RETURN QUERY
      SELECT *
      FROM   event e
      WHERE  f_mmdd(e.event_date) BETWEEN d AND d1
      ORDER  BY f_mmdd(e.event_date), e.event_date;

   ELSE  -- wrap around end of year
      RETURN QUERY
      SELECT *
      FROM   event e
      WHERE  f_mmdd(e.event_date) >= d OR
             f_mmdd(e.event_date) <= d1
      ORDER  BY (f_mmdd(e.event_date) >= d) DESC, f_mmdd(e.event_date), event_date;
      -- chronological across turn of the year
   END IF;
END
$func$  LANGUAGE plpgsql;

Call using defaults: 14 days beginning "today":

SELECT * FROM f_anniversary();

Call for 7 days beginning '2014-08-23':

SELECT * FROM f_anniversary(date '2014-08-23', 7);

SQL Fiddle comparing EXPLAIN ANALYZE.

February 29

When dealing with anniversaries or "birthdays", you need to define how to deal with the special case "February 29" in leap years.

When testing for ranges of dates, Feb 29 is usually included automatically, even if the current year is not a leap year. The range of days is extended by 1 retroactively when it covers this day.
On the other hand, if the current year is a leap year, and you want to look for 15 days, you may end up getting results for 14 days in leap years if your data is from non-leap years.

Say, Bob is born on the 29th of February:
My query 1. and 2. include February 29 only in leap years. Bob has birthday only every ~ 4 years.
My query 3. includes February 29 in the range. Bob has birthday every year.

There is no magical solution. You have to define what you want for every case.

Test

To substantiate my point I ran an extensive test with all the presented solutions. I adapted each of the queries to the given table and to yield identical results without ORDER BY.

The good news: all of them are correct and yield the same result - except for Gordon's query that had syntax errors, and @wildplasser's query that fails when the year wraps around (easy to fix).

Insert 108000 rows with random dates from the 20th century, which is similar to a table of living people (13 or older).

INSERT INTO  event (event_date)
SELECT '2000-1-1'::date - (random() * 36525)::int
FROM   generate_series (1, 108000);

Delete ~ 8 % to create some dead tuples and make the table more "real life".

DELETE FROM event WHERE random() < 0.08;
ANALYZE event;

My test case had 99289 rows, 4012 hits.

C - Catcall

WITH anniversaries as (
   SELECT event_id, event_date
         ,(event_date + (n || ' years')::interval)::date anniversary
   FROM   event, generate_series(13, 113) n
   )
SELECT event_id, event_date -- count(*)   --
FROM   anniversaries
WHERE  anniversary BETWEEN current_date AND current_date + interval '14' day;

C1 - Catcall's idea rewritten

Aside from minor optimizations, the major difference is to add only the exact amount of years date_trunc('year', age(current_date + 14, event_date)) to get this year's anniversary, which avoids the need for a CTE altogether:

SELECT event_id, event_date
FROM   event
WHERE (event_date + date_trunc('year', age(current_date + 14, event_date)))::date
       BETWEEN current_date AND current_date + 14;

D - Daniel

SELECT *   -- count(*)   -- 
FROM   event
WHERE  extract(month FROM age(current_date + 14, event_date))  = 0
AND    extract(day   FROM age(current_date + 14, event_date)) <= 14;

E1 - Erwin 1

See "1. Simple version" above.

E2 - Erwin 2

See "2. Advanced version" above.

E3 - Erwin 3

See "3. Black magic version" above.

G - Gordon

SELECT * -- count(*)   
FROM  (SELECT *, to_char(event_date, 'MM-DD') AS mmdd FROM event) e
WHERE  to_date(to_char(now(), 'YYYY') || '-'
                 || (CASE WHEN mmdd = '02-29' THEN '02-28' ELSE mmdd END)
              ,'YYYY-MM-DD') BETWEEN date(now()) and date(now()) + 14;

H - a_horse_with_no_name

WITH upcoming as (
   SELECT event_id, event_date
         ,CASE 
            WHEN date_trunc('year', age(event_date)) = age(event_date)
                 THEN current_date
            ELSE cast(event_date + ((extract(year FROM age(event_date)) + 1)
                      * interval '1' year) AS date) 
          END AS next_event
   FROM event
   )
SELECT event_id, event_date
FROM   upcoming
WHERE  next_event - current_date  <= 14;

W - wildplasser

CREATE OR REPLACE FUNCTION this_years_birthday(_dut date) RETURNS date AS
$func$
DECLARE
    ret date;
BEGIN
    ret :=
    date_trunc( 'year' , current_timestamp)
        + (date_trunc( 'day' , _dut)
         - date_trunc( 'year' , _dut));
    RETURN ret;
END
$func$ LANGUAGE plpgsql;

Simplified to return the same as all the others:

SELECT *
FROM   event e
WHERE  this_years_birthday( e.event_date::date )
        BETWEEN current_date
        AND     current_date + '2weeks'::interval;

W1 - wildplasser's query rewritten

The above suffers from a number of inefficient details (beyond the scope of this already sizable post). The rewritten version is much faster:

CREATE OR REPLACE FUNCTION this_years_birthday(_dut INOUT date) AS
$func$
SELECT (date_trunc('year', now()) + ($1 - date_trunc('year', $1)))::date
$func$ LANGUAGE sql;

SELECT *
FROM   event e
WHERE  this_years_birthday(e.event_date)
        BETWEEN current_date
        AND    (current_date + 14);

Test results

I ran this test with a temporary table on PostgreSQL 9.1.7. Results were gathered with EXPLAIN ANALYZE, best of 5.

Results

Without index
C:  Total runtime: 76714.723 ms
C1: Total runtime:   307.987 ms  -- !
D:  Total runtime:   325.549 ms
E1: Total runtime:   253.671 ms  -- !
E2: Total runtime:   484.698 ms  -- min() & max() expensive without index
E3: Total runtime:   213.805 ms  -- !
G:  Total runtime:   984.788 ms
H:  Total runtime:   977.297 ms
W:  Total runtime:  2668.092 ms
W1: Total runtime:   596.849 ms  -- !

With index
E1: Total runtime:    37.939 ms  --!!
E2: Total runtime:    38.097 ms  --!!

With index on expression
E3: Total runtime:    11.837 ms  --!!

All other queries perform the same with or without index because they use non-sargable expressions.

Conclusio

  • So far, @Daniel's query was the fastest.

  • @wildplassers (rewritten) approach performs acceptably, too.

  • @Catcall's version is something like the reverse approach of mine. Performance gets out of hand quickly with bigger tables.
    The rewritten version performs pretty well, though. The expression I use is something like a simpler version of @wildplassser's this_years_birthday() function.

  • My "simple version" is faster even without index, because it needs fewer computations.

  • With index, the "advanced version" is about as fast as the "simple version", because min() and max() become very cheap with an index. Both are substantially faster than the rest which cannot use the index.

  • My "black magic version" is fastest with or without index. And it is very simple to call.
    The updated version (after the benchmark) is a bit faster, yet.

  • With a real life table an index will make even greater difference. More columns make the table bigger, and sequential scan more expensive, while the index size stays the same.

这篇关于你如何计算忽略年份的日期数学?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆