2列范围查询 [英] Range queries on 2 columns

查看：77 发布时间：2020/10/16 2:12:19 sql sql-server database query-optimization

本文介绍了2列范围查询的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个非常庞大的书架（大约有1亿张），其中有书架信息。

I have very huge table Shelve(approximately 100 millions) which has Shelve info for books.

书架

ShevleID     RangeStart      RangeEnd  
----------------------------------------
   1               1           100
   2             200           500
   3             501           1000

每本书都有唯一的编号BookID。假设您有一本BookID为50的书。
然后该书必须保留在书架1中，因为50介于1到100之间。

Each book has unique number BookID given to it. Lets say you have a book with BookID 50. Then Book must be kept in Shelve 1 because 50 lies between 1 and 100.

Books

BookID     BookName
---------------------------
   1       Book1
   2       Book2
   .
   .
  50       Book3

我的查询是这样的-

SELECT 
    BookID, 
    BookName, 
    ShelveID
FROM 
    Book B
LEFT JOIN  
    Shelve S 
      ON B.BookID 
                BETWEEN 
                       S.RenageStart 
                       AND
                       S.RangeEND

此查询非常慢，因为查询一次只能使用索引RangeStart或RangeEnd之一。

This query is super slow because query is able to use index only one of the columns either RangeStart or RangeEnd at a time.

我已经尝试了这5个选项-

I have already tried these 5 options-

在StartIP上创建索引

Create Index on StartIP

在EndIP上创建索引

Create Index on EndIP

在StartIP（包含EndIP列）上创建包含的索引

Create included Index on StartIP (included column EndIP)

在EndIP上创建包含的索引（包含在StartIP列中）

Create included Index on EndIP (included column StartIP)

在StartIP上创建索引，EndIP

Create Index on StartIP,EndIP

有人可以建议我一些建议吗？

Can someone please suggest me some approach to achieve this?

推荐答案

如果您希望每本书都拥有一个货架价值，可以尝试：

If you want one shelve value for each book, you can try:

SELECT b.*,
       (SELECT TOP 1 s.ShelveId
        FROM Shelve S
        WHERE b.BookId >= s.RangeStart
        ORDER BY s.RangeStart DESC
       ) as ShelveId
FROM Book B;

这应该有效地利用 Shelve（RangeStart，ShelveId）上的索引。

这假设您要一个 ShelveId ，并且书本范围没有重叠

This assumes that you want one ShelveId and the book ranges are not overlapping.

我很好奇您真正的应用程序是什么。据我所知，没有图书馆拥有数亿本书。

I am curious what your real application is. No libraries (to the best of my knowledge) have hundreds of millions of books.

编辑：

您可以使用 case 语句处理丢失的 ShelveId ：

You can handle the missing ShelveId with a case statement:

SELECT b.*,
       (SELECT TOP 1 (case when b.BookId between s.RangeStart and s.RangeEnd then s.ShelveId end)
        FROM Shelve S
        WHERE b.BookId >= s.RangeStart
        ORDER BY s.RangeStart DESC
       ) as ShelveId
FROM Book B;

如果其他假设均成立，则可能会解决您的问题。

If the other assumptions are true, this may solve your problem.

编辑II：

如果需要其他属性，请尝试交叉套用。它应该具有类似的性能特征：

If you want other attributes, then try cross apply. It should have similar performance characteristics:

SELECT b.*,
       s.*
FROM Book B CROSS APPLY
     (SELECT TOP 1 (case when b.BookId between s.RangeStart and s.RangeEnd then s.ShelveId end) as RangeStart, . . .
        FROM Shelve S
        WHERE b.BookId >= s.RangeStart
        ORDER BY s.RangeStart DESC
       ) s

现在，进行一些实验。我想写：

Now, comes a bit of experimentation. I would like to write:

SELECT b.*,
       s.*
FROM Book B CROSS APPLY
     (SELECT TOP 1 s.*
        FROM Shelve S
        WHERE b.BookId >= s.RangeStart and b.BookId <= s.RangeEnd
        ORDER BY s.RangeStart DESC
       ) s

但是，这可能会混淆优化引擎并阻止使用索引。如果可行，那就太好了。如果不起作用，我建议您将第一个版本与每个变量的 case 一起使用。或者，使用相关的子查询版本并连接回主键上的 Shelve 表。

But, this might confuse the optimization engine and prevent the use of the index. If it works, great. If it doesn't work, I would suggest either using the first version with a case for every variable. Or, use the correlated subquery version and join back to the Shelve table on a primary key.

这篇关于2列范围查询的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

2列范围查询 [英] Range queries on 2 columns

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

2列范围查询 [英] Range queries on 2 columns

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭