在SQL中有效地连接间隔范围 [英] Efficiently joining over interval ranges in SQL

查看:83
本文介绍了在SQL中有效地连接间隔范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有两个表,如下所示(数据来自此 SO帖子):

Suppose I have two tables as follows (data taken from this SO post):

d1:

 x start end
 a     1   3
 b     5  11
 c    19  22
 d    30  39
 e     7  25

d2:

 x pos
 a   2
 a   3
 b   3
 b  12
 c  20
 d  52
 e  10

两个表中的第一行都是列标题.我想提取d2中的所有行,其中列xd1匹配,而pos1属于(包括边界值)d1startend列.也就是说,我想要结果:

The first row in both tables are column headers. I'd like to extract all the rows in d2 where column x matches with d1 and pos1 falls within (including boundary values) d1's start and end columns. That is, I'd like the result:

 x pos start  end
 a   2     1    3
 a   3     1    3
 c  20    19   22
 e  10     7   25

到目前为止,我看到的方式是:

The way I've seen this done so far is:

SELECT * FROM d1 JOIN d2 USING (x) WHERE pos BETWEEN start AND end

但是我不确定的是,此操作是否尽可能有效(即内部优化)完成.例如,首先计算整个联接实际上不是可扩展的方法恕我直言(就速度和内存而言).

But what is not clear to me is if this operation is done as efficient as it can be (i.e., optimised internally). For example, computing the entire join first is not really a scalable approach IMHO (in terms of both speed and memory).

是否还有其他有效的查询优化方法(例如:使用间隔树)或其他算法在我可以利用的SQL中可以有效地处理范围(再次在速度和内存方面)?是否使用SQLite,PostgreSQL,mySQL等都没有关系.

Are there any other efficient query optimisations (ex: using interval trees) or other algorithms that can handle ranges efficiently (again, in terms of both speed and memory) in SQL that I can make use of? It doesn't matter if it's using SQLite, PostgreSQL, mySQL etc..

在SQL中执行此操作的最有效方法是什么?

What is the most efficient way to perform this operation in SQL?

非常感谢您.

推荐答案

不知道它在内部如何工作,但是根据情况,我建议使用一个表来推出" d1中的所有值然后加入那个.通过这种方式,查询引擎可以精确地"精确定位正确的记录,而不必查找与所要查找的值匹配的边界组合.

Not sure how it all works out internally, but depending on the situation I would advice to play around with a table that 'rolls out' all the values from d1 and then join on that one. This way the query engine can pinpoint the right record 'exactly' instead of having to find a combination of boundaries that match the value being looked for.

例如

x value
a  1
a  2
a  3
b  5
b  6
b  7
b  8
b  9
b 10
b 11
c 19 etc..

给定值列(**)上的索引,它应该比在原始d1表恕我直言之间使用BETWEEN开始和结束连接快很多.

given an index on the value column (**), this should be quite a bit faster than joining with the BETWEEN start AND end on the original d1 table IMHO.

当然,每次对d1进行更改时,您也需要调整推出的表格(触发?).如果这种情况经常发生,您将花费更多的时间来更新推出的表,而不是一开始就花了更多的时间!另外,如果某些间隔确实很大,那么这可能会很快占用大量的(磁盘)空间.并且还假设我们不需要查找非整数(例如,如果我们查找值3.14会怎样?)

Off course, each time you make changes to d1, you'll need to adjust the rolled out table too (trigger?). If this happens frequently you'll spend more time updating the rolled out table than you gained in the first place! Additionally, this might take quite a bit of (disk)space quickly if some of the intervals are really big; and also, this assumes we don't need to look for non-whole numbers (e.g. what if we look for the value 3.14 ?)

(您可以考虑在此处(值,x)上尝试一个唯一的值...)

(You might consider experimenting with a unique one on (value, x) here...)

这篇关于在SQL中有效地连接间隔范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆