在PostgreSQL中执行这个小时的操作查询 [英] Perform this hours of operation query in PostgreSQL

查看:1374
本文介绍了在PostgreSQL中执行这个小时的操作查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在RoR堆栈中,我不得不写一些实际的SQL来完成对所有打开记录的查询,这意味着当前时间在指定的操作时间内。在 hours_of_operations 表两个整数 opens_on closes_on 存储一个工作日,两个时间字段 opens_at 和<$



我做了一个查询,将当前日期和时间与存储的值进行比较但是我想知道是否有一种方法来转换到某种日期类型,并且PostgreSQL做了其余的工作?



查询的核心是:

  WHERE(


/ *打开未来* /
(opens_on> ; 5 OR(opens_on = 5 AND opens_at :: time>'2014-03-01 00:27:25.851655'))
AND(
(closes_on< opens_on AND closes_on> 5)
OR((closes_on = opens_on)
AND(closes_at :: time< opens_at :: time AND closes_at :: time>'2014-03-01 00:27:25.851655'))
OR((closes_on = 5)
AND(closes_at :: time>'2014-03-01 00:27:25.851655'AND closes_at :: time< opens_at :: time)))
OR

/ *在过去打开* /
(opens_on< 5 OR(opens_on = 5 AND opens_at :: time<'2014-03-01 00:27:25.851655'))
AND
(closes_on> 5)
OR $ b b((closes_on = 5)
AND(closes_at :: time>'2014-03-01 00:27:25.851655'))
OR(closes_on< opens_on)
OR (closes_on = opens_on)
AND(closes_at :: time< opens_at :: time))



pre>

这种密集复杂性的原因是一个小时的操作可能在周末结束,例如,从星期日中午开始,经过上午6点星期一。由于我以UTC存储值,在许多情况下,用户的本地时间可能以非常奇怪的方式包装。

解决方案

表格布局

h2>

重新设计表和商店营业时间(营业时间)为 tsrange (不包含时区的时间范围) 。需要Postgres 9.2或更高版本



选择一个随机周来开始营业时间。我喜欢这个星期:

1996-01-01(星期一) 1996-01-07(星期日)

这是最近的闰年,1月1日方便地是一个星期一。但它可以是任何随机周的这种情况。



安装附加模块 btree_gist 为什么?

  CREATE EXTENSION btree_gist; 

创建如下表格:

  CREATE TABLE hoo(
hoo_id serial PRIMARY KEY
,shop_id int NOT NULL REFERENCES shop(shop_id) - 商店参考
,小时tsrange NOT NULL
,CONSTRAINT hoo_no_overlap EXCLUDE使用gist(shop_id与=,小时WITH&&)
,CONSTRAINT hoo_bounds_inclusive CHECK(lower_inc(小时)AND upper_inc(小时))
,CONSTRAINT hoo_standard_week CHECK < @ tsrange'[1996-01-01 0:0,1996-01-08 0:0]')
);

小时替换所有列:

   opens_on,closes_on,opens_at,closes_at   

例如,营业时间从星期三,18:30到星期四,05:00 UTC输入为:

 '[1996-01-03 18:30,1996-01-04 05 :00]'

排除约束 hoo_no_overlap 可防止每个商店的重复条目。它使用 GiST索引实现,这也支持您的查询。



检查约束 hoo_bounds_inclusive 请参阅下面的索引和性能 / code> 为您的范围强制包含边界,具有两个值得注意的后果:





  • 相同商店的相邻条目实际上是不允许的。具有包容性边界,这些将重叠,排除约束将引起例外。相邻条目必须合并到单个行中。除非他们围绕星期日午夜,在这种情况下,他们必须分成两行。 code> 使用范围由运算符< @ 包含。



    的范围,您必须遵守周末午夜周末时间的特殊情况:

     '1996-01-01 00:00 + 0'='1996-01-08 00:00 + 0'
    星期一00:00 =星期日24:00(=下一个星期一00:00 )

    您必须同时搜索两个时间戳。以下是具有 独占 上限的相关案例,不会显示此缺点:





    功能 f_hoo_time(timestamptz)



    使用时区标准化任何给定的时间戳:

      CREATE OR REPLACE FUNCTION f_hoo_time(timestamptz)
    RETURNS timestamp AS
    $ func $
    SELECT date'1996-01-01'
    + AT TIME ZONE'UTC' - date_trunc('week',$ 1 AT TIME ZONE'UTC'))
    $ func $ LANGUAGE sql IMMUTABLE;

    函数 timestamptz c $ c> timestamp 。它将在UTC时间(!)中的相应周($ 1 - date_trunc('week',$ 1)的已用间隔添加到我们的临时周的起始点code> date + interval 产生 timestamp 。)



    函数 f_hoo_hours(timestamptz,timestamptz)



    这个函数需要任何间隔(两个 timestamptz ),并产生一个或两个标准化的 tsrange 包括 任何 合法输入,并禁止其余部分:

      CREATE OR REPLACE FUNCTION f_hoo_hours(_from timestamptz,_to timestamptz)
    RETURNS TABLE(hoo_hours tsrange)AS
    $ func $
    DECLARE
    ts_from timestamp:= f_hoo_time(_from);
    ts_to timestamp:= f_hoo_time(_to);
    BEGIN
    - 测试输入sanity(可选)
    IF _to< = _from THEN
    RAISE EXCEPTION'%' '_to必须晚于_from!';
    ELSIF _to> _from + interval'1 week'THEN
    RAISE EXCEPTION'%','间隔不能超过一周!
    END IF;

    如果ts_from> ts_to THEN - 分割范围为星期一00:00
    返回查询
    VALUES(tsrange('1996-01-01 0:0',ts_to,'[]'))
    , tsrange(ts_from,'1996-01-08 0:0','[]'));
    ELSE - 简单情况:标准周的范围
    hoo_hours:= tsrange(ts_from,ts_to,'[]');
    RETURN NEXT;
    END IF;

    RETURN;
    END
    $ func $ LANGUAGE plpgsql IMMUTABLE COST 1000 ROWS 1;

    INSERT / em>输入行:

      INSERT INTO hoo(shop_id,小时)
    SELECT 123,f_hoo_hours 01-11 00:00 + 04','2016-01-11 08:00 + 04');

    如果范围需要在Mon 00分割,这会导致两行 00。



    INSERT 多个输入行:

      INSERT INTO hoo(shop_id,hours)
    SELECT id,hours
    FROM(
    VALUES(7,timestamp' -01-11 00:00',timestamp'2016-01-11 08:00')
    ,(8,'2016-01-11 00:00','2016-01-11 08:00' )
    )t(id,f,t),f_hoo_hours(f,t)小时; - LATERAL join

    关于隐式 LATERAL 加入:





查询



使用调整后的设计,您的整个大,复杂,昂贵的查询 这:


SELECT *

FROM hoo

WHERE小时@> ; f_hoo_time(now());


扰流板在解决方案。



查询是由GiST索引支持的,即使对于大表也是如此。



SQL Fiddle



如果您想计算总开放时间(每间商店),以下是一个配方:





索引和性能



包含操作符范围类型可以通过 GiST a>或 SP-GiST 索引。两者都可用于实施排除约束,但只有GiST支持多列索引


目前,只有B树,GiST,GIN和BRIN索引类型支持多列索引。


并且索引列顺序问题


多列GiST索引可用于查询条件
涉及索引列的任何子集。额外的
列的条件限制索引返回的条目,但是第一列上的条件
是确定需要扫描索引的
的最重要的条目。如果第一列只有几个不同的值,GiST索引将相对
无效,即使
,如果附加列中有许多不同的值。


因此,我们在这里有冲突的利益。对于大表, shop_id 小时有更多不同的值。




  • 具有领导 shop_id 的GiST索引写入速度更快,并强制实施排除约束。

  • 但是我们在查询中搜索 hours 列。

  • 如果我们需要在其他查询中查找 shop_id ,那么纯btree索引会更快

  • 为了优先,我在小时上找到了 SP-GiST >

    p>我的脚本生成哑元数据:

      INSERT INTO hoo(shop_id,小时)
    SELECT id,hours
    FROM generate_series(1,30000)id,generate_series(0,6)d
    ,f_hoo_hours((date'1996-01-01'+ d)+ interval'4h'+ interval '15 min' * trunc(32 * random()))AT TIME ZONE'UTC'
    ,((date'1996-01-01'+ d)+ interval'12h'+ interval '15 min'* trunc random()* random()))AT TIME ZONE'UTC')AS小时
    WHERE random()> .33;

    结果为141k个随机生成的行,30k不同 shop_id ,12k distinct 小时。 (通常差异会更大。)表大小8 MB。



    我删除并重新创建了排除约束:

      ALTER TABLE hoo ADD CONSTRAINT hoo_no_overlap 
    EXCLUDE使用gist(shop_id WITH =,小时WITH&& - 4.4秒!

    ALTER TABLE hoo ADD CONSTRAINT hoo_no_overlap
    EXCLUDE USING gist(hours with&& shop_id WITH =); - 16.4秒

    shop_id 4倍速。



    此外,我还测试了两个读取性能:

      CREATE INDEX hoo_hours_gist_idx on hoo USING gist(hours); 
    CREATE INDEX hoo_hours_spgist_idx on hoo使用spgist(小时); - !

    之后 VACUUM FULL ANALYZE hoo; 运行了两个查询:




    • Q1 :深夜,只找到 53行 / li>
    • Q2 :下午找到 2423行



    结果



    每个都有一个索引扫描当然,除了无索引

      index idx size Q1 Q2 
    ----------------- -------------------------------
    无索引41.24 ms 41.2 ms
    gist(shop_id,小时)8MB 14.71 ms 33.3 ms
    gist(hours,shop_id)12MB 0.37 ms 8.2 ms
    gist(小时)11MB 0.34 ms 5.1 ms
    spgist(小时)9MB 0.29 ms 2.0 ms - ! !




    • SP-GiST和GiST对于查询找到几个结果

      很少)。

    • SP-GiST随着结果数量的增加而变得更好,并且也更小。



    如果读取的数量多于您写入的数量(典型用例),请保持开头建议的排除约束,并创建一个附加的SP-GiST索引以优化读取性能。


    I'm in the RoR stack and I had to write some actual SQL to complete this query for all records that are "open", meaning that the current time is within the specified hours of operation. In the hours_of_operations table two integer columns opens_on and closes_on store a weekday, and two time fields opens_at and closes_at store the respective time of the day.

    I made a query that compares the current date and time to the stored values but I'm wondering if there is a way to cast to some sort of date type and have PostgreSQL do the rest?

    The meat of the query is:

    WHERE (
     (
    
     /* Opens in Future */
     (opens_on > 5 OR (opens_on = 5 AND opens_at::time > '2014-03-01 00:27:25.851655'))
     AND (
     (closes_on < opens_on AND closes_on > 5)
     OR ((closes_on = opens_on)
     AND (closes_at::time < opens_at::time AND closes_at::time > '2014-03-01 00:27:25.851655'))
     OR ((closes_on = 5)
     AND (closes_at::time > '2014-03-01 00:27:25.851655' AND closes_at::time < opens_at::time)))
     OR
    
     /* Opens in Past */
     (opens_on < 5 OR (opens_on = 5 AND opens_at::time < '2014-03-01 00:27:25.851655'))
     AND
     (closes_on > 5)
     OR
     ((closes_on = 5)
     AND (closes_at::time > '2014-03-01 00:27:25.851655'))
     OR (closes_on < opens_on)
     OR ((closes_on = opens_on)
     AND (closes_at::time < opens_at::time))
     )
    
     )
    

    Th reason for such dense complexity is that an hour of operation may wrap around the end of the week, for example, starting at noon on Sunday and going through 6 AM Monday. Since I store values in UTC, there are many cases in which local time of the user could wrap in a very strange way. The query above ensures that you could enter ANY two times of the week and we compensate for the wrapping.

    解决方案

    Table layout

    Re-design the table and store opening hours (hours of operation) as a set of tsrange (range of timestamp without time zone) values. Requires Postgres 9.2 or later.

    Pick a random week to stage your opening hours. I like the week:
    1996-01-01 (Monday) to 1996-01-07 (Sunday)
    That's the most recent leap year where Jan 1st conveniently happens to be a Monday. But it can be any random week for this case. Just be consistent.

    Install the additional module btree_gist first. Why?

    CREATE EXTENSION btree_gist;
    

    Create the table like this:

    CREATE TABLE hoo (
       hoo_id  serial PRIMARY KEY
     , shop_id int NOT NULL REFERENCES shop(shop_id)     -- reference to shop
     , hours   tsrange NOT NULL
     , CONSTRAINT hoo_no_overlap EXCLUDE USING gist (shop_id with =, hours WITH &&)
     , CONSTRAINT hoo_bounds_inclusive CHECK (lower_inc(hours) AND upper_inc(hours))
     , CONSTRAINT hoo_standard_week CHECK (hours <@ tsrange '[1996-01-01 0:0, 1996-01-08 0:0]')
    );
    

    The one column hours replaces all of your columns:

    opens_on, closes_on, opens_at, closes_at

    For instance, hours of operation from Wednesday, 18:30 to Thursday, 05:00 UTC are entered as:

    '[1996-01-03 18:30, 1996-01-04 05:00]'
    

    The exclusion constraint hoo_no_overlap prevents overlapping entries per shop. It is implemented with a GiST index, which also happens to support your query. Consider the chapter "Index and Performance" below discussing the indexing strategy.

    The check constraint hoo_bounds_inclusive enforces inclusive boundaries for your ranges, with two noteworthy consequences:

    • A point in time falling on lower or upper boundary exactly is always included.
    • Adjacent entries for the same shop are effectively disallowed. With inclusive borders, those would "overlap" and the exclusion constraint would raise an exception. Adjacent entries must be merged into a single row. Except when they wrap around Sunday midnight, in which case they must be split into two rows. See Tool 2 below.

    The check constraint hoo_standard_week enforces the outer bounds of the staging week with the "range is contained by" operator <@.

    With inclusive bounds, you have to observe a special / corner case where the time wraps around Sunday midnight:

    '1996-01-01 00:00+0' = '1996-01-08 00:00+0'
     Mon 00:00 = Sun 24:00 (= next Mon 00:00)
    

    You have to search for both timestamps at once. Here is a related case with exclusive upper bound that wouldn't exhibit this shortcoming:

    Function f_hoo_time(timestamptz)

    To "normalize" any given timestamp with time zone:

    CREATE OR REPLACE FUNCTION f_hoo_time(timestamptz)
      RETURNS timestamp AS
    $func$
    SELECT date '1996-01-01'
        + ($1 AT TIME ZONE 'UTC' - date_trunc('week', $1 AT TIME ZONE 'UTC'))
    $func$  LANGUAGE sql IMMUTABLE;
    

    The function takes timestamptz and returns timestamp. It adds the elapsed interval of the respective week ($1 - date_trunc('week', $1) in UTC time (!) to the starting point of our staging week. (date + interval produces timestamp.)

    Function f_hoo_hours(timestamptz, timestamptz)

    To normalize ranges and split those crossing Mon 00:00. This function takes any interval (as two timestamptz) and produces one or two normalized tsrange values. It covers any legal input and disallows the rest:

    CREATE OR REPLACE FUNCTION f_hoo_hours(_from timestamptz, _to timestamptz)
      RETURNS TABLE (hoo_hours tsrange) AS
    $func$
    DECLARE
       ts_from timestamp := f_hoo_time(_from);
       ts_to   timestamp := f_hoo_time(_to);
    BEGIN
       -- test input for sanity (optional)
       IF _to <= _from THEN
          RAISE EXCEPTION '%', '_to must be later than _from!';
       ELSIF _to > _from + interval '1 week' THEN
          RAISE EXCEPTION '%', 'Interval cannot span more than a week!';
       END IF;
    
       IF ts_from > ts_to THEN  -- split range at Mon 00:00
          RETURN QUERY
          VALUES (tsrange('1996-01-01 0:0', ts_to  , '[]'))
               , (tsrange(ts_from, '1996-01-08 0:0', '[]'));
       ELSE                     -- simple case: range in standard week
          hoo_hours := tsrange(ts_from, ts_to, '[]');
          RETURN NEXT;
       END IF;
    
       RETURN;
    END
    $func$  LANGUAGE plpgsql IMMUTABLE COST 1000 ROWS 1;
    

    To INSERT a single input row:

    INSERT INTO hoo(shop_id, hours)
    SELECT 123, f_hoo_hours('2016-01-11 00:00+04', '2016-01-11 08:00+04');
    

    This results in two rows if the range needs splitting at Mon 00:00.

    To INSERT multiple input rows:

    INSERT INTO hoo(shop_id, hours)
    SELECT id, hours
    FROM  (
       VALUES (7, timestamp '2016-01-11 00:00', timestamp '2016-01-11 08:00')
            , (8, '2016-01-11 00:00', '2016-01-11 08:00')
       ) t(id, f, t), f_hoo_hours(f, t) hours;  -- LATERAL join
    

    About the implicit LATERAL join:

    Query

    With the adjusted design, your whole big, complex, expensive query can be replaced with ... this:

    SELECT *
    FROM hoo
    WHERE hours @> f_hoo_time(now());

    For a little suspense I put a spoiler plate over the solution. Move the mouse over it.

    The query is backed by said GiST index and fast, even for big tables.

    SQL Fiddle (with more examples).

    If you want to calculate total opening hours (per shop), here is a recipe:

    Index and Performance

    The containment operator for range types can be supported with a GiST or SP-GiST index. Either can be used to implement an exclusion constraint, but only GiST supports multicolumn indexes:

    Currently, only the B-tree, GiST, GIN, and BRIN index types support multicolumn indexes.

    And the order of index columns matters:

    A multicolumn GiST index can be used with query conditions that involve any subset of the index's columns. Conditions on additional columns restrict the entries returned by the index, but the condition on the first column is the most important one for determining how much of the index needs to be scanned. A GiST index will be relatively ineffective if its first column has only a few distinct values, even if there are many distinct values in additional columns.

    So we have conflicting interests here. For big tables, there will be many more distinct values for shop_id than for hours.

    • A GiST index with leading shop_id is faster to write and to enforce the exclusion constraint.
    • But we are searching the hours column in our query. Having that column first would be better.
    • If we need to look up shop_id in other queries, a plain btree index is much faster for that.
    • To top it off, I found an SP-GiST index on just hours to be fastest for the query.

    Benchmark

    My script to generate dummy data:

    INSERT INTO hoo(shop_id, hours)
    SELECT id, hours
    FROM   generate_series(1, 30000) id, generate_series(0, 6) d
         , f_hoo_hours(((date '1996-01-01' + d) + interval  '4h' + interval '15 min' * trunc(32 * random()))            AT TIME ZONE 'UTC'
                     , ((date '1996-01-01' + d) + interval '12h' + interval '15 min' * trunc(64 * random() * random())) AT TIME ZONE 'UTC') AS hours
    WHERE  random() > .33;
    

    Results in 141k randomly generated rows, 30k distinct shop_id, 12k distinct hours. (Typically the difference will be greater.) Table size 8 MB.

    I dropped and recreated the exclusion constraint:

    ALTER TABLE hoo ADD CONSTRAINT hoo_no_overlap
       EXCLUDE USING gist (shop_id WITH =, hours WITH &&);  --  4.4 sec !!
    
    ALTER TABLE hoo ADD CONSTRAINT hoo_no_overlap
       EXCLUDE USING gist (hours WITH &&, shop_id WITH =);  -- 16.4 sec
    

    shop_id first is ~ 4x faster.

    In addition, I tested two more for read performance:

    CREATE INDEX hoo_hours_gist_idx   on hoo USING gist (hours);
    CREATE INDEX hoo_hours_spgist_idx on hoo USING spgist (hours);  -- !!
    

    After VACUUM FULL ANALYZE hoo;, I ran two queries:

    • Q1: late night, finding only 53 rows
    • Q2: in the afternoon, finding 2423 rows.

    Results

    Got an index-only scan for each (except for "no index", of course):

    index                 idx size  Q1         Q2
    ------------------------------------------------
    no index                        41.24 ms   41.2 ms 
    gist (shop_id, hours)    8MB    14.71 ms   33.3 ms
    gist (hours, shop_id)   12MB     0.37 ms    8.2 ms
    gist (hours)            11MB     0.34 ms    5.1 ms
    spgist (hours)           9MB     0.29 ms    2.0 ms  -- !!
    

    • SP-GiST and GiST are on par for queries finding few results (GiST is even faster for very few).
    • SP-GiST scales better with a growing number of results, and is smaller, too.

    If you read a lot more than you write (typical use case), keep the exclusion constraint as suggested at the outset and create an additional SP-GiST index to optimize read performance.

    这篇关于在PostgreSQL中执行这个小时的操作查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆