在PostgreSQL中存储和查询间隔树 [英] Storing and querying interval tree in PostgreSQL

查看:78
本文介绍了在PostgreSQL中存储和查询间隔树的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些要在Postgres中存储和查询的数据集(以示例为例)。

例如:

数据集A:1, 7,9-13

数据集B:1,7,10

我要运行查询,例如:

1. B是B的子集吗?一个? (是)

2. A和B的交点是什么? (B)

数据集可以包含数千个整数范围。

我想知道是否有一些扩展支持这种数据分析。

任何示例/链接将不胜感激。

I have some data sets (lets assume it's integers for the example) which I want to store and query in Postgres.
For example:
Data set A: 1,7,9-13
Data set B: 1, 7, 10
I want to run query such as:
1. Is B a subset of A? (Yes)
2. What is the intersection of A and B? (B)
The data sets can include thousands of integer ranges.
I was wondering if there is some extension which supports such data analysis.
Any examples / links will be appreciated.

推荐答案

您可以使用范围数据类型并将每个不连续的类型存储在一行中。

You could use the range data types and store each disjoint type in a row.

为您的示例:

-- The table
CREATE TABLE sets(id text, range int4range);
-- Values of set A
INSERT INTO sets VALUES('A', '[1,1]'),('A','[7,7]'),('A','[9,13]');
-- Values of set B
INSERT INTO sets VALUES('B','[1,1]'),('B','[7,7]'),('B','[10,10]');

要检查B是否为A的子集,可以将A的范围包含的所有元组都加入B的范围:

To check if B is a subset of A, you can join both with all tuples that A's range contains B's range:

 SELECT b.range
 FROM sets b JOIN sets a
     ON a.range @> b.range
 WHERE a.id='A' AND b.id='B'

这样,您可以检查集合B中的所有值是否都在上述结果中(这意味着B的所有范围都包含在A的至少一个范围中):

With that, you can check if all the values from set B are in the above result (which will mean that all the ranges of B is contained by at least one range of A):

 SELECT NOT EXISTS(
     SELECT 1 FROM sets q WHERE q.id='B' AND q.range NOT IN (
         SELECT b.range
         FROM sets b JOIN sets a
             ON a.range @> b.range
         WHERE a.id='A' AND b.id='B'
     ));

要获得交集,您可以交叉加入两者并排除空的:

To get the intersection, you can cross join both and exclude the empty ones:

 SELECT * FROM (
     SELECT a.range * b.range AS intersec
     FROM sets a CROSS JOIN sets b WHERE  a.id='A' AND b.id='B'
 ) i WHERE NOT isempty(i.intersec);

这种方法的一个问题是,必须通过不同的元组仅保留不相交的范围S。例如,集合中的范围[1,5]和[4,7]必须位于仅包含[1,7]的元组中。为了确保这一点,您可以将它们插入到临时表中(在插入或更新时),它们将表本身与重叠的元组交叉连接,然后将其连接并保持其他状态。

One problem about this approach, is that you must keep only disjoint rangeS through different tuples. For instance, range [1,5] and [4,7] from a set must reside in a tuple with [1,7] only. To make sure of it, you can insert them into a temporary table (while inserting or updating), them cross join the table itself with tuples that overlaps and them join those and keep the others the way they are.

这篇关于在PostgreSQL中存储和查询间隔树的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆