在PostgreSQL中执行这个小时的操作查询 [英] Perform this hours of operation query in PostgreSQL
问题描述
我在RoR堆栈中,我不得不写一些实际的SQL来完成对所有打开记录的查询,这意味着当前时间在指定的操作时间内。在 hours_of_operations
表两个整数
列 opens_on
和 closes_on
存储一个工作日,两个时间
字段 opens_at
和<$
我做了一个查询,将当前日期和时间与存储的值进行比较但是我想知道是否有一种方法来转换到某种日期类型,并且PostgreSQL做了其余的工作?
查询的核心是:
WHERE(
pre>
(
/ *打开未来* /
(opens_on> ; 5 OR(opens_on = 5 AND opens_at :: time>'2014-03-01 00:27:25.851655'))
AND(
(closes_on< opens_on AND closes_on> 5)
OR((closes_on = opens_on)
AND(closes_at :: time< opens_at :: time AND closes_at :: time>'2014-03-01 00:27:25.851655'))
OR((closes_on = 5)
AND(closes_at :: time>'2014-03-01 00:27:25.851655'AND closes_at :: time< opens_at :: time)))
OR
/ *在过去打开* /
(opens_on< 5 OR(opens_on = 5 AND opens_at :: time<'2014-03-01 00:27:25.851655'))
AND
(closes_on> 5)
OR $ b b((closes_on = 5)
AND(closes_at :: time>'2014-03-01 00:27:25.851655'))
OR(closes_on< opens_on)
OR (closes_on = opens_on)
AND(closes_at :: time< opens_at :: time))
)
)
这种密集复杂性的原因是一个小时的操作可能在周末结束,例如,从星期日中午开始,经过上午6点星期一。由于我以UTC存储值,在许多情况下,用户的本地时间可能以非常奇怪的方式包装。
解决方案
表格布局 h2>
重新设计表和商店营业时间(营业时间)为
tsrange
(不包含时区的时间范围) 。需要Postgres 9.2或更高版本。
选择一个随机周来开始营业时间。我喜欢这个星期:
1996-01-01(星期一)到 1996-01-07(星期日)
这是最近的闰年,1月1日方便地是一个星期一。但它可以是任何随机周的这种情况。
安装附加模块
btree_gist
。 为什么?CREATE EXTENSION btree_gist;
创建如下表格:
CREATE TABLE hoo(
hoo_id serial PRIMARY KEY
,shop_id int NOT NULL REFERENCES shop(shop_id) - 商店参考
,小时tsrange NOT NULL
,CONSTRAINT hoo_no_overlap EXCLUDE使用gist(shop_id与=,小时WITH&&)
,CONSTRAINT hoo_bounds_inclusive CHECK(lower_inc(小时)AND upper_inc(小时))
,CONSTRAINT hoo_standard_week CHECK < @ tsrange'[1996-01-01 0:0,1996-01-08 0:0]')
);
列
小时
替换所有列:
opens_on,closes_on,opens_at,closes_at例如,营业时间从星期三,18:30到星期四,05:00 UTC输入为:
'[1996-01-03 18:30,1996-01-04 05 :00]'
排除约束
hoo_no_overlap
可防止每个商店的重复条目。它使用 GiST索引实现,这也支持您的查询。
检查约束
hoo_bounds_inclusive 请参阅下面的索引和性能 / code>
为您的范围强制包含边界,具有两个值得注意的后果:
- 相同商店的相邻条目实际上是不允许的。具有包容性边界,这些将重叠,排除约束将引起例外。相邻条目必须合并到单个行中。除非他们围绕星期日午夜,在这种情况下,他们必须分成两行。 code> 使用范围由运算符
< @
包含。
的范围,您必须遵守周末午夜周末时间的特殊情况:
'1996-01-01 00:00 + 0'='1996-01-08 00:00 + 0'
星期一00:00 =星期日24:00(=下一个星期一00:00 )
您必须同时搜索两个时间戳。以下是具有 独占 上限的相关案例,不会显示此缺点:
功能
f_hoo_time(timestamptz)
使用时区标准化任何给定的
时间戳:
CREATE OR REPLACE FUNCTION f_hoo_time(timestamptz)
RETURNS timestamp AS
$ func $
SELECT date'1996-01-01'
+ AT TIME ZONE'UTC' - date_trunc('week',$ 1 AT TIME ZONE'UTC'))
$ func $ LANGUAGE sql IMMUTABLE;
函数
timestamptz
c $ c> timestamp 。它将在UTC时间(!)中的相应周($ 1 - date_trunc('week',$ 1)
的已用间隔添加到我们的临时周的起始点code> date +interval
产生timestamp
。)
函数
f_hoo_hours(timestamptz,timestamptz)
这个函数需要任何间隔(两个
timestamptz
),并产生一个或两个标准化的tsrange
包括 任何 合法输入,并禁止其余部分:CREATE OR REPLACE FUNCTION f_hoo_hours(_from timestamptz,_to timestamptz)
RETURNS TABLE(hoo_hours tsrange)AS
$ func $
DECLARE
ts_from timestamp:= f_hoo_time(_from);
ts_to timestamp:= f_hoo_time(_to);
BEGIN
- 测试输入sanity(可选)
IF _to< = _from THEN
RAISE EXCEPTION'%' '_to必须晚于_from!';
ELSIF _to> _from + interval'1 week'THEN
RAISE EXCEPTION'%','间隔不能超过一周!
END IF;
如果ts_from> ts_to THEN - 分割范围为星期一00:00
返回查询
VALUES(tsrange('1996-01-01 0:0',ts_to,'[]'))
, tsrange(ts_from,'1996-01-08 0:0','[]'));
ELSE - 简单情况:标准周的范围
hoo_hours:= tsrange(ts_from,ts_to,'[]');
RETURN NEXT;
END IF;
RETURN;
END
$ func $ LANGUAGE plpgsql IMMUTABLE COST 1000 ROWS 1;
到
INSERT
/ em>输入行:INSERT INTO hoo(shop_id,小时)
SELECT 123,f_hoo_hours 01-11 00:00 + 04','2016-01-11 08:00 + 04');
如果范围需要在Mon 00分割,这会导致两行 00。
到
INSERT
多个输入行:INSERT INTO hoo(shop_id,hours)
SELECT id,hours
FROM(
VALUES(7,timestamp' -01-11 00:00',timestamp'2016-01-11 08:00')
,(8,'2016-01-11 00:00','2016-01-11 08:00' )
)t(id,f,t),f_hoo_hours(f,t)小时; - LATERAL join
关于隐式
LATERAL
加入:
查询
使用调整后的设计,您的整个大,复杂,昂贵的查询 这:
SELECT *
FROM hoo
WHERE小时@> ; f_hoo_time(now());
扰流板在解决方案。
查询是由GiST索引支持的,即使对于大表也是如此。
如果您想计算总开放时间(每间商店),以下是一个配方:
索引和性能
包含操作符范围类型可以通过 GiST a>或 SP-GiST 索引。两者都可用于实施排除约束,但只有GiST支持多列索引:
目前,只有B树,GiST,GIN和BRIN索引类型支持多列索引。
并且索引列顺序问题:
多列GiST索引可用于查询条件
涉及索引列的任何子集。额外的
列的条件限制索引返回的条目,但是第一列上的条件
是确定需要扫描索引的
的最重要的条目。如果第一列只有几个不同的值,GiST索引将相对
无效,即使
,如果附加列中有许多不同的值。
因此,我们在这里有冲突的利益。对于大表, shop_id
比小时
有更多不同的值。
- 具有领导
shop_id
的GiST索引写入速度更快,并强制实施排除约束。 - 但是我们在查询中搜索
hours
列。 - 如果我们需要在其他查询中查找
shop_id
,那么纯btree索引会更快 - 为了优先,我在
小时
上找到了 SP-GiST >
p>我的脚本生成哑元数据:INSERT INTO hoo(shop_id,小时)
SELECT id,hours
FROM generate_series(1,30000)id,generate_series(0,6)d
,f_hoo_hours((date'1996-01-01'+ d)+ interval'4h'+ interval '15 min' * trunc(32 * random()))AT TIME ZONE'UTC'
,((date'1996-01-01'+ d)+ interval'12h'+ interval '15 min'* trunc random()* random()))AT TIME ZONE'UTC')AS小时
WHERE random()> .33;
结果为141k个随机生成的行,30k不同
shop_id
,12k distinct小时
。 (通常差异会更大。)表大小8 MB。
我删除并重新创建了排除约束:
ALTER TABLE hoo ADD CONSTRAINT hoo_no_overlap
EXCLUDE使用gist(shop_id WITH =,小时WITH&& - 4.4秒!
ALTER TABLE hoo ADD CONSTRAINT hoo_no_overlap
EXCLUDE USING gist(hours with&& shop_id WITH =); - 16.4秒
shop_id
4倍速。
此外,我还测试了两个读取性能:
CREATE INDEX hoo_hours_gist_idx on hoo USING gist(hours);
CREATE INDEX hoo_hours_spgist_idx on hoo使用spgist(小时); - !
之后
VACUUM FULL ANALYZE hoo;
运行了两个查询:
- Q1 :深夜,只找到 53行 / li>
- Q2 :下午找到 2423行。
结果
每个都有一个索引扫描当然,除了无索引
index idx size Q1 Q2
----------------- -------------------------------
无索引41.24 ms 41.2 ms
gist(shop_id,小时)8MB 14.71 ms 33.3 ms
gist(hours,shop_id)12MB 0.37 ms 8.2 ms
gist(小时)11MB 0.34 ms 5.1 ms
spgist(小时)9MB 0.29 ms 2.0 ms - ! !
- SP-GiST和GiST对于查询找到几个结果很少)。
- SP-GiST随着结果数量的增加而变得更好,并且也更小。
如果读取的数量多于您写入的数量(典型用例),请保持开头建议的排除约束,并创建一个附加的SP-GiST索引以优化读取性能。
I'm in the RoR stack and I had to write some actual SQL to complete this query for all records that are "open", meaning that the current time is within the specified hours of operation. In the
hours_of_operations
table twointeger
columnsopens_on
andcloses_on
store a weekday, and twotime
fieldsopens_at
andcloses_at
store the respective time of the day.I made a query that compares the current date and time to the stored values but I'm wondering if there is a way to cast to some sort of date type and have PostgreSQL do the rest?
The meat of the query is:
WHERE ( ( /* Opens in Future */ (opens_on > 5 OR (opens_on = 5 AND opens_at::time > '2014-03-01 00:27:25.851655')) AND ( (closes_on < opens_on AND closes_on > 5) OR ((closes_on = opens_on) AND (closes_at::time < opens_at::time AND closes_at::time > '2014-03-01 00:27:25.851655')) OR ((closes_on = 5) AND (closes_at::time > '2014-03-01 00:27:25.851655' AND closes_at::time < opens_at::time))) OR /* Opens in Past */ (opens_on < 5 OR (opens_on = 5 AND opens_at::time < '2014-03-01 00:27:25.851655')) AND (closes_on > 5) OR ((closes_on = 5) AND (closes_at::time > '2014-03-01 00:27:25.851655')) OR (closes_on < opens_on) OR ((closes_on = opens_on) AND (closes_at::time < opens_at::time)) ) )
Th reason for such dense complexity is that an hour of operation may wrap around the end of the week, for example, starting at noon on Sunday and going through 6 AM Monday. Since I store values in UTC, there are many cases in which local time of the user could wrap in a very strange way. The query above ensures that you could enter ANY two times of the week and we compensate for the wrapping.
解决方案Table layout
Re-design the table and store opening hours (hours of operation) as a set of
tsrange
(range of timestamp without time zone) values. Requires Postgres 9.2 or later.Pick a random week to stage your opening hours. I like the week:
1996-01-01 (Monday) to 1996-01-07 (Sunday)
That's the most recent leap year where Jan 1st conveniently happens to be a Monday. But it can be any random week for this case. Just be consistent.Install the additional module
btree_gist
first. Why?CREATE EXTENSION btree_gist;
Create the table like this:
CREATE TABLE hoo ( hoo_id serial PRIMARY KEY , shop_id int NOT NULL REFERENCES shop(shop_id) -- reference to shop , hours tsrange NOT NULL , CONSTRAINT hoo_no_overlap EXCLUDE USING gist (shop_id with =, hours WITH &&) , CONSTRAINT hoo_bounds_inclusive CHECK (lower_inc(hours) AND upper_inc(hours)) , CONSTRAINT hoo_standard_week CHECK (hours <@ tsrange '[1996-01-01 0:0, 1996-01-08 0:0]') );
The one column
hours
replaces all of your columns:opens_on, closes_on, opens_at, closes_atFor instance, hours of operation from Wednesday, 18:30 to Thursday, 05:00 UTC are entered as:
'[1996-01-03 18:30, 1996-01-04 05:00]'
The exclusion constraint
hoo_no_overlap
prevents overlapping entries per shop. It is implemented with a GiST index, which also happens to support your query. Consider the chapter "Index and Performance" below discussing the indexing strategy.The check constraint
hoo_bounds_inclusive
enforces inclusive boundaries for your ranges, with two noteworthy consequences:- A point in time falling on lower or upper boundary exactly is always included.
- Adjacent entries for the same shop are effectively disallowed. With inclusive borders, those would "overlap" and the exclusion constraint would raise an exception. Adjacent entries must be merged into a single row. Except when they wrap around Sunday midnight, in which case they must be split into two rows. See Tool 2 below.
The check constraint
hoo_standard_week
enforces the outer bounds of the staging week with the "range is contained by" operator<@
.With inclusive bounds, you have to observe a special / corner case where the time wraps around Sunday midnight:
'1996-01-01 00:00+0' = '1996-01-08 00:00+0' Mon 00:00 = Sun 24:00 (= next Mon 00:00)
You have to search for both timestamps at once. Here is a related case with exclusive upper bound that wouldn't exhibit this shortcoming:
Function
f_hoo_time(timestamptz)
To "normalize" any given
timestamp with time zone
:CREATE OR REPLACE FUNCTION f_hoo_time(timestamptz) RETURNS timestamp AS $func$ SELECT date '1996-01-01' + ($1 AT TIME ZONE 'UTC' - date_trunc('week', $1 AT TIME ZONE 'UTC')) $func$ LANGUAGE sql IMMUTABLE;
The function takes
timestamptz
and returnstimestamp
. It adds the elapsed interval of the respective week($1 - date_trunc('week', $1)
in UTC time (!) to the starting point of our staging week. (date
+interval
producestimestamp
.)Function
f_hoo_hours(timestamptz, timestamptz)
To normalize ranges and split those crossing Mon 00:00. This function takes any interval (as two
timestamptz
) and produces one or two normalizedtsrange
values. It covers any legal input and disallows the rest:CREATE OR REPLACE FUNCTION f_hoo_hours(_from timestamptz, _to timestamptz) RETURNS TABLE (hoo_hours tsrange) AS $func$ DECLARE ts_from timestamp := f_hoo_time(_from); ts_to timestamp := f_hoo_time(_to); BEGIN -- test input for sanity (optional) IF _to <= _from THEN RAISE EXCEPTION '%', '_to must be later than _from!'; ELSIF _to > _from + interval '1 week' THEN RAISE EXCEPTION '%', 'Interval cannot span more than a week!'; END IF; IF ts_from > ts_to THEN -- split range at Mon 00:00 RETURN QUERY VALUES (tsrange('1996-01-01 0:0', ts_to , '[]')) , (tsrange(ts_from, '1996-01-08 0:0', '[]')); ELSE -- simple case: range in standard week hoo_hours := tsrange(ts_from, ts_to, '[]'); RETURN NEXT; END IF; RETURN; END $func$ LANGUAGE plpgsql IMMUTABLE COST 1000 ROWS 1;
To
INSERT
a single input row:INSERT INTO hoo(shop_id, hours) SELECT 123, f_hoo_hours('2016-01-11 00:00+04', '2016-01-11 08:00+04');
This results in two rows if the range needs splitting at Mon 00:00.
To
INSERT
multiple input rows:INSERT INTO hoo(shop_id, hours) SELECT id, hours FROM ( VALUES (7, timestamp '2016-01-11 00:00', timestamp '2016-01-11 08:00') , (8, '2016-01-11 00:00', '2016-01-11 08:00') ) t(id, f, t), f_hoo_hours(f, t) hours; -- LATERAL join
About the implicit
LATERAL
join:Query
With the adjusted design, your whole big, complex, expensive query can be replaced with ... this:
SELECT *
FROM hoo
WHERE hours @> f_hoo_time(now());
For a little suspense I put a spoiler plate over the solution. Move the mouse over it.
The query is backed by said GiST index and fast, even for big tables.
SQL Fiddle (with more examples).
If you want to calculate total opening hours (per shop), here is a recipe:
Index and Performance
The containment operator for range types can be supported with a GiST or SP-GiST index. Either can be used to implement an exclusion constraint, but only GiST supports multicolumn indexes:
Currently, only the B-tree, GiST, GIN, and BRIN index types support multicolumn indexes.
And the order of index columns matters:
A multicolumn GiST index can be used with query conditions that involve any subset of the index's columns. Conditions on additional columns restrict the entries returned by the index, but the condition on the first column is the most important one for determining how much of the index needs to be scanned. A GiST index will be relatively ineffective if its first column has only a few distinct values, even if there are many distinct values in additional columns.
So we have conflicting interests here. For big tables, there will be many more distinct values for
shop_id
than forhours
.- A GiST index with leading
shop_id
is faster to write and to enforce the exclusion constraint. - But we are searching the
hours
column in our query. Having that column first would be better. - If we need to look up
shop_id
in other queries, a plain btree index is much faster for that. - To top it off, I found an SP-GiST index on just
hours
to be fastest for the query.
Benchmark
My script to generate dummy data:
INSERT INTO hoo(shop_id, hours) SELECT id, hours FROM generate_series(1, 30000) id, generate_series(0, 6) d , f_hoo_hours(((date '1996-01-01' + d) + interval '4h' + interval '15 min' * trunc(32 * random())) AT TIME ZONE 'UTC' , ((date '1996-01-01' + d) + interval '12h' + interval '15 min' * trunc(64 * random() * random())) AT TIME ZONE 'UTC') AS hours WHERE random() > .33;
Results in 141k randomly generated rows, 30k distinct
shop_id
, 12k distincthours
. (Typically the difference will be greater.) Table size 8 MB.I dropped and recreated the exclusion constraint:
ALTER TABLE hoo ADD CONSTRAINT hoo_no_overlap EXCLUDE USING gist (shop_id WITH =, hours WITH &&); -- 4.4 sec !! ALTER TABLE hoo ADD CONSTRAINT hoo_no_overlap EXCLUDE USING gist (hours WITH &&, shop_id WITH =); -- 16.4 sec
shop_id
first is ~ 4x faster.In addition, I tested two more for read performance:
CREATE INDEX hoo_hours_gist_idx on hoo USING gist (hours); CREATE INDEX hoo_hours_spgist_idx on hoo USING spgist (hours); -- !!
After
VACUUM FULL ANALYZE hoo;
, I ran two queries:- Q1: late night, finding only 53 rows
- Q2: in the afternoon, finding 2423 rows.
Results
Got an index-only scan for each (except for "no index", of course):
index idx size Q1 Q2 ------------------------------------------------ no index 41.24 ms 41.2 ms gist (shop_id, hours) 8MB 14.71 ms 33.3 ms gist (hours, shop_id) 12MB 0.37 ms 8.2 ms gist (hours) 11MB 0.34 ms 5.1 ms spgist (hours) 9MB 0.29 ms 2.0 ms -- !!
- SP-GiST and GiST are on par for queries finding few results (GiST is even faster for very few).
- SP-GiST scales better with a growing number of results, and is smaller, too.
If you read a lot more than you write (typical use case), keep the exclusion constraint as suggested at the outset and create an additional SP-GiST index to optimize read performance.
这篇关于在PostgreSQL中执行这个小时的操作查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- Q1 :深夜,只找到 53行 / li>