在 PostgreSQL 中执行这个小时的操作查询 [英] Perform this hours of operation query in PostgreSQL

查看:36
本文介绍了在 PostgreSQL 中执行这个小时的操作查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 RoR 堆栈中,我必须编写一些实际的 SQL 来完成对所有打开"记录的查询,这意味着当前时间在指定的操作小时内.在hours_of_operations 表中,两个integeropens_oncloses_on 存储一个工作日,两个time 字段 opens_atcloses_at 分别存储一天中的时间.

I'm in the RoR stack and I had to write some actual SQL to complete this query for all records that are "open", meaning that the current time is within the specified hours of operation. In the hours_of_operations table two integer columns opens_on and closes_on store a weekday, and two time fields opens_at and closes_at store the respective time of the day.

我做了一个查询,将当前日期和时间与存储的值进行比较,但我想知道是否有办法转换为某种日期类型并让 PostgreSQL 完成剩下的工作?

I made a query that compares the current date and time to the stored values but I'm wondering if there is a way to cast to some sort of date type and have PostgreSQL do the rest?

查询的主要内容是:

WHERE (
 (

 /* Opens in Future */
 (opens_on > 5 OR (opens_on = 5 AND opens_at::time > '2014-03-01 00:27:25.851655'))
 AND (
 (closes_on < opens_on AND closes_on > 5)
 OR ((closes_on = opens_on)
 AND (closes_at::time < opens_at::time AND closes_at::time > '2014-03-01 00:27:25.851655'))
 OR ((closes_on = 5)
 AND (closes_at::time > '2014-03-01 00:27:25.851655' AND closes_at::time < opens_at::time)))
 OR

 /* Opens in Past */
 (opens_on < 5 OR (opens_on = 5 AND opens_at::time < '2014-03-01 00:27:25.851655'))
 AND
 (closes_on > 5)
 OR
 ((closes_on = 5)
 AND (closes_at::time > '2014-03-01 00:27:25.851655'))
 OR (closes_on < opens_on)
 OR ((closes_on = opens_on)
 AND (closes_at::time < opens_at::time))
 )

 )

如此密集的复杂性的原因是一个小时的操作可能会在一周结束时结束,例如,从周日中午开始到周一早上 6 点.由于我以 UTC 格式存储值,因此在很多情况下用户的本地时间可能会以一种非常奇怪的方式换行.上面的查询确保您可以在一周中输入任意两次,并且我们会补偿包装.

Th reason for such dense complexity is that an hour of operation may wrap around the end of the week, for example, starting at noon on Sunday and going through 6 AM Monday. Since I store values in UTC, there are many cases in which local time of the user could wrap in a very strange way. The query above ensures that you could enter ANY two times of the week and we compensate for the wrapping.

推荐答案

表格布局

重新设计表格以将开放时间(营业时间)存储为一组 tsrange(timestamp without time zone) 值的范围.需要 Postgres 9.2 或更高版本.

Table layout

Re-design the table to store opening hours (hours of operation) as a set of tsrange (range of timestamp without time zone) values. Requires Postgres 9.2 or later.

随机选择一周来安排您的营业时间.我喜欢一周:
1996-01-01(星期一)1996-01-07(星期日)
这是最近的闰年,1 月 1 日恰好是星期一.但对于这种情况,它可以是任何随机的一周.保持一致.

Pick a random week to stage your opening hours. I like the week:
1996-01-01 (Monday) to 1996-01-07 (Sunday)
That's the most recent leap year where Jan 1st conveniently happens to be a Monday. But it can be any random week for this case. Just be consistent.

安装附加模块 btree_gist 首先:

Install the additional module btree_gist first:

CREATE EXTENSION btree_gist;

见:

然后像这样创建表:

CREATE TABLE hoo (
   hoo_id  serial PRIMARY KEY
 , shop_id int NOT NULL -- REFERENCES shop(shop_id)     -- reference to shop
 , hours   tsrange NOT NULL
 , CONSTRAINT hoo_no_overlap EXCLUDE USING gist (shop_id with =, hours WITH &&)
 , CONSTRAINT hoo_bounds_inclusive CHECK (lower_inc(hours) AND upper_inc(hours))
 , CONSTRAINT hoo_standard_week CHECK (hours <@ tsrange '[1996-01-01 0:0, 1996-01-08 0:0]')
);

onehours 替换所有列:

opens_on, closes_on, opens_at, closes_at

例如,从 星期三 18:30星期四 05:00 UTC 的工作时间输入为:

For instance, hours of operation from Wednesday, 18:30 to Thursday, 05:00 UTC are entered as:

'[1996-01-03 18:30, 1996-01-04 05:00]'

排除约束hoo_no_overlap 可防止每个商店的条目重叠.它是通过 GiST 索引 实现的,它也恰好支持我们的查询.考虑下面讨论索引策略的章节索引和性能".

The exclusion constraint hoo_no_overlap prevents overlapping entries per shop. It is implemented with a GiST index, which also happens to support our queries. Consider the chapter "Index and Performance" below discussing indexing strategies.

检查约束hoo_bounds_inclusive 强制您的范围包含边界,有两个值得注意的后果:

The check constraint hoo_bounds_inclusive enforces inclusive boundaries for your ranges, with two noteworthy consequences:

  • 始终包含恰好落在下边界或上边界的时间点.
  • 实际上不允许同一商店的相邻条目.有了包含边界,这些边界就会重叠".并且排除约束会引发异常.相邻的条目必须合并为一行.除非它们在周日午夜结束,在这种情况下,它们必须分成两行.下面的函数 f_hoo_hours() 负责处理这一点.
  • A point in time falling on lower or upper boundary exactly is always included.
  • Adjacent entries for the same shop are effectively disallowed. With inclusive bounds, those would "overlap" and the exclusion constraint would raise an exception. Adjacent entries must be merged into a single row instead. Except when they wrap around Sunday midnight, in which case they must be split into two rows. The function f_hoo_hours() below takes care of this.

检查约束 hoo_standard_week 使用 范围由"包含运算符 <@.

对于包容性边界,您必须观察一个特殊情况,其中时间在周日午夜结束:

With inclusive bounds, you have to observe a corner case where the time wraps around at Sunday midnight:

'1996-01-01 00:00+0' = '1996-01-08 00:00+0'
 Mon 00:00 = Sun 24:00 (= next Mon 00:00)

您必须同时搜索两个时间戳.下面是一个相关的案例,具有 exclusive 上限,但不会出现此缺点:

You have to search for both timestamps at once. Here is a related case with exclusive upper bound that wouldn't exhibit this shortcoming:

标准化"任何给定的带时区的时间戳:

CREATE OR REPLACE FUNCTION f_hoo_time(timestamptz)
  RETURNS timestamp
  LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
$func$
SELECT timestamp '1996-01-01' + ($1 AT TIME ZONE 'UTC' - date_trunc('week', $1 AT TIME ZONE 'UTC'))
$func$;

PARALLEL SAFE 仅适用于 Postgres 9.6 或更高版本.

PARALLEL SAFE only for Postgres 9.6 or later.

该函数接受timestamptz 并返回timestamp.它将 UTC 时间中相应周 ($1 - date_trunc('week', $1) 的经过时间间隔添加到我们暂存周的起点.(date + interval 产生 timestamp.)

The function takes timestamptz and returns timestamp. It adds the elapsed interval of the respective week ($1 - date_trunc('week', $1) in UTC time to the starting point of our staging week. (date + interval produces timestamp.)

标准化范围并拆分那些跨越周一 00:00 的范围.此函数采用任何间隔(作为两个 timestamptz)并生成一两个标准化的 tsrange 值.它涵盖了任何合法的输入并禁止其余的输入:

To normalize ranges and split those crossing Mon 00:00. This function takes any interval (as two timestamptz) and produces one or two normalized tsrange values. It covers any legal input and disallows the rest:

CREATE OR REPLACE FUNCTION f_hoo_hours(_from timestamptz, _to timestamptz)
  RETURNS TABLE (hoo_hours tsrange)
  LANGUAGE plpgsql IMMUTABLE PARALLEL SAFE COST 500 ROWS 1 AS
$func$
DECLARE
   ts_from timestamp := f_hoo_time(_from);
   ts_to   timestamp := f_hoo_time(_to);
BEGIN
   -- sanity checks (optional)
   IF _to <= _from THEN
      RAISE EXCEPTION '%', '_to must be later than _from!';
   ELSIF _to > _from + interval '1 week' THEN
      RAISE EXCEPTION '%', 'Interval cannot span more than a week!';
   END IF;

   IF ts_from > ts_to THEN  -- split range at Mon 00:00
      RETURN QUERY
      VALUES (tsrange('1996-01-01', ts_to  , '[]'))
           , (tsrange(ts_from, '1996-01-08', '[]'));
   ELSE                     -- simple case: range in standard week
      hoo_hours := tsrange(ts_from, ts_to, '[]');
      RETURN NEXT;
   END IF;

   RETURN;
END
$func$;

INSERT一个单个输入行:

INSERT INTO hoo(shop_id, hours)
SELECT 123, f_hoo_hours('2016-01-11 00:00+04', '2016-01-11 08:00+04');

对于任意个输入行:

INSERT INTO hoo(shop_id, hours)
SELECT id, f_hoo_hours(f, t)
FROM  (
   VALUES (7, timestamptz '2016-01-11 00:00+0', timestamptz '2016-01-11 08:00+0')
        , (8, '2016-01-11 00:00+1', '2016-01-11 08:00+1')
   ) t(id, f, t);

如果范围需要在周一 00:00 UTC 拆分,则每行都可以插入两行.

Each can insert two rows if a range needs splitting at Mon 00:00 UTC.

通过调整后的设计,您的整个庞大、复杂、昂贵的查询可以替换为...:

With the adjusted design, your whole big, complex, expensive query can be replaced with ... this:

SELECT *
FROM hoo
WHERE hours @>f_hoo_time(now());

为了一点悬念,我在解决方案上放了一个扰流板.将鼠标移到上面.

该查询由上述 GiST 索引支持并且速度很快,即使对于大表也是如此.

The query is backed by said GiST index and fast, even for big tables.

db<>fiddle 这里更多例子)
sqlfiddle

如果您想计算总营业时间(每家商店),这里有一个方法:

If you want to calculate total opening hours (per shop), here is a recipe:

范围类型的包含运算符 可以支持 GiSTSP-GiST 索引.两者都可用于实现排除约束,但只有 GiST 支持 多列索引:

目前只有 B-tree、GiST、GIN 和 BRIN 索引类型支持多列索引.

Currently, only the B-tree, GiST, GIN, and BRIN index types support multicolumn indexes.

索引列的顺序很重要:

多列 GiST 索引可以与查询条件一起使用涉及索引列的任何子集.附加条件列限制索引返回的条目,但条件第一列是最重要的,用于确定多少需要扫描索引.GiST 索引将相对如果它的第一列只有几个不同的值,则无效,即使如果附加列中有许多不同的值.

A multicolumn GiST index can be used with query conditions that involve any subset of the index's columns. Conditions on additional columns restrict the entries returned by the index, but the condition on the first column is the most important one for determining how much of the index needs to be scanned. A GiST index will be relatively ineffective if its first column has only a few distinct values, even if there are many distinct values in additional columns.

所以我们在这里利益冲突.对于大表,shop_id 的不同值将比 hours 的值多得多.

So we have conflicting interests here. For big tables, there will be many more distinct values for shop_id than for hours.

  • 带有前导 shop_id 的 GiST 索引的编写速度和执行排除约束的速度更快.
  • 但我们正在查询中搜索 hours.先有那个专栏会更好.
  • 如果我们需要在其他查询中查找 shop_id,那么简单的 btree 索引要快得多.
  • 最重要的是,我发现一个 SP-GiST 索引只需 hours 即可最快进行查询.
  • A GiST index with leading shop_id is faster to write and to enforce the exclusion constraint.
  • But we are searching hours in our query. Having that column first would be better.
  • If we need to look up shop_id in other queries, a plain btree index is much faster for that.
  • To top it off, I found an SP-GiST index on just hours to be fastest for the query.

在旧笔记本电脑上使用 Postgres 12 进行的新测试.我的脚本生成虚拟数据:

New test with Postgres 12 on an old laptop. My script to generate dummy data:

INSERT INTO hoo(shop_id, hours)
SELECT id
     , f_hoo_hours(((date '1996-01-01' + d) + interval  '4h' + interval '15 min' * trunc(32 * random()))            AT TIME ZONE 'UTC'
                 , ((date '1996-01-01' + d) + interval '12h' + interval '15 min' * trunc(64 * random() * random())) AT TIME ZONE 'UTC')
FROM   generate_series(1, 30000) id
JOIN   generate_series(0, 6) d ON random() > .33;

结果是 ~ 141k 随机生成的行,~ 30k 不同的 shop_id,~ 12k 不同的 hours.表大小 8 MB.

Results in ~ 141k randomly generated rows, ~ 30k distinct shop_id, ~ 12k distinct hours. Table size 8 MB.

我删除并重新创建了排除约束:

I dropped and recreated the exclusion constraint:

ALTER TABLE hoo
  DROP CONSTRAINT hoo_no_overlap
, ADD CONSTRAINT hoo_no_overlap  EXCLUDE USING gist (shop_id WITH =, hours WITH &&);  -- 3.5 sec; index 8 MB
    
ALTER TABLE hoo
  DROP CONSTRAINT hoo_no_overlap
, ADD CONSTRAINT hoo_no_overlap  EXCLUDE USING gist (hours WITH &&, shop_id WITH =);  -- 13.6 sec; index 12 MB

shop_id 首先是此发行版的约 4 倍速度.

shop_id first is ~ 4x faster for this distribution.

此外,我还测试了两个读取性能:

In addition, I tested two more for read performance:

CREATE INDEX hoo_hours_gist_idx   on hoo USING gist (hours);
CREATE INDEX hoo_hours_spgist_idx on hoo USING spgist (hours);  -- !!

VACUUM FULL ANALYZE hoo;之后,我运行了两个查询:

After VACUUM FULL ANALYZE hoo;, I ran two queries:

  • Q1:深夜,仅找到35 行
  • Q2:下午,发现 4547 行.
  • Q1: late night, finding only 35 rows
  • Q2: in the afternoon, finding 4547 rows.

对每个进行仅索引扫描(当然,无索引"除外):

Got an index-only scan for each (except for "no index", of course):

index                 idx size  Q1        Q2
------------------------------------------------
no index                        38.5 ms   38.5 ms 
gist (shop_id, hours)    8MB    17.5 ms   18.4 ms
gist (hours, shop_id)   12MB     0.6 ms    3.4 ms
gist (hours)            11MB     0.3 ms    3.1 ms
spgist (hours)           9MB     0.7 ms    1.8 ms  -- !

  • SP-GiST 和 GiST 对于发现很少结果的查询是相当的(GiST 对于非常很少的结果甚至更快).
  • 随着结果数量的增加,SP-GiST 的扩展性更好,而且规模也更小.
    • SP-GiST and GiST are on par for queries finding few results (GiST is even faster for very few).
    • SP-GiST scales better with a growing number of results, and is smaller, too.
    • 如果您读的比写的多(典型用例),请按照一开始的建议保留排除约束,并创建额外的 SP-GiST 索引以优化读取性能.

      If you read a lot more than you write (typical use case), keep the exclusion constraint as suggested at the outset and create an additional SP-GiST index to optimize read performance.

      这篇关于在 PostgreSQL 中执行这个小时的操作查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆