2列范围查询 [英] Range queries on 2 columns
问题描述
我有一个非常庞大的书架(大约有1亿张),其中有书架信息。
I have very huge table Shelve(approximately 100 millions) which has Shelve info for books.
书架
ShevleID RangeStart RangeEnd
----------------------------------------
1 1 100
2 200 500
3 501 1000
每本书都有唯一的编号BookID。假设您有一本BookID为50的书。
然后该书必须保留在书架1中,因为50介于1到100之间。
Each book has unique number BookID given to it. Lets say you have a book with BookID 50. Then Book must be kept in Shelve 1 because 50 lies between 1 and 100.
Books
BookID BookName
---------------------------
1 Book1
2 Book2
.
.
50 Book3
我的查询是这样的-
SELECT
BookID,
BookName,
ShelveID
FROM
Book B
LEFT JOIN
Shelve S
ON B.BookID
BETWEEN
S.RenageStart
AND
S.RangeEND
此查询非常慢,因为查询一次只能使用索引RangeStart或RangeEnd之一。
This query is super slow because query is able to use index only one of the columns either RangeStart or RangeEnd at a time.
我已经尝试了这5个选项-
I have already tried these 5 options-
-
在StartIP上创建索引
Create Index on StartIP
在EndIP上创建索引
Create Index on EndIP
在StartIP(包含EndIP列)上创建包含的索引
Create included Index on StartIP (included column EndIP)
在EndIP上创建包含的索引(包含在StartIP列中)
Create included Index on EndIP (included column StartIP)
在StartIP上创建索引,EndIP
Create Index on StartIP,EndIP
有人可以建议我一些建议吗?
Can someone please suggest me some approach to achieve this?
推荐答案
如果您希望每本书都拥有一个货架价值,可以尝试:
If you want one shelve value for each book, you can try:
SELECT b.*,
(SELECT TOP 1 s.ShelveId
FROM Shelve S
WHERE b.BookId >= s.RangeStart
ORDER BY s.RangeStart DESC
) as ShelveId
FROM Book B;
这应该有效地利用 Shelve(RangeStart,ShelveId)上的索引
。
这假设您要一个 ShelveId
,并且书本范围没有重叠
This assumes that you want one ShelveId
and the book ranges are not overlapping.
我很好奇您真正的应用程序是什么。据我所知,没有图书馆拥有数亿本书。
I am curious what your real application is. No libraries (to the best of my knowledge) have hundreds of millions of books.
编辑:
您可以使用 case
语句处理丢失的 ShelveId
:
You can handle the missing ShelveId
with a case
statement:
SELECT b.*,
(SELECT TOP 1 (case when b.BookId between s.RangeStart and s.RangeEnd then s.ShelveId end)
FROM Shelve S
WHERE b.BookId >= s.RangeStart
ORDER BY s.RangeStart DESC
) as ShelveId
FROM Book B;
如果其他假设均成立,则可能会解决您的问题。
If the other assumptions are true, this may solve your problem.
编辑II:
如果需要其他属性,请尝试交叉套用
。它应该具有类似的性能特征:
If you want other attributes, then try cross apply
. It should have similar performance characteristics:
SELECT b.*,
s.*
FROM Book B CROSS APPLY
(SELECT TOP 1 (case when b.BookId between s.RangeStart and s.RangeEnd then s.ShelveId end) as RangeStart, . . .
FROM Shelve S
WHERE b.BookId >= s.RangeStart
ORDER BY s.RangeStart DESC
) s
现在,进行一些实验。我想写:
Now, comes a bit of experimentation. I would like to write:
SELECT b.*,
s.*
FROM Book B CROSS APPLY
(SELECT TOP 1 s.*
FROM Shelve S
WHERE b.BookId >= s.RangeStart and b.BookId <= s.RangeEnd
ORDER BY s.RangeStart DESC
) s
但是,这可能会混淆优化引擎并阻止使用索引。如果可行,那就太好了。如果不起作用,我建议您将第一个版本与每个变量的 case
一起使用。或者,使用相关的子查询版本并连接回主键上的 Shelve
表。
But, this might confuse the optimization engine and prevent the use of the index. If it works, great. If it doesn't work, I would suggest either using the first version with a case
for every variable. Or, use the correlated subquery version and join back to the Shelve
table on a primary key.
这篇关于2列范围查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!