给定日期范围的此查询的最快方法(最佳策略是什么) [英] Fastest way for this query (What is the best strategy) given a date range
问题描述
我有一个表 A,它有一个 startDate 和一个 end dateDate 作为 2 个日期时间列,还有一些其他列.我有另一个表 B,它有一个日期时间列,称为日期列.这是在 SQL Server 2005 中.
I have a table A that has a startDate and an end dateDate as 2 datetime columns besides some more other columns. I have another table B that has one datetime column call it dates column. This is in SQL Server 2005.
这里的问题是:如何最好地设置索引等以获得以下内容:
Here the question: How to best set up the indexes etc to get the following:
select ....
from A , B
where A.startDate >= B.dates
and A.endDate < B.dates
两个表都有几千条记录.
Both tables have several thousand records.
推荐答案
更新:
有关使用计算列的查询的高效索引策略,请参阅我博客中的这篇文章:
See this article in my blog for efficient indexing strategy for your query using computed columns:
主要思想是我们只是为您的范围计算四舍五入的length
和startDate
,然后使用相等条件搜索它们(这对B-树
索引)
The main idea is that we just compute rounded length
and startDate
for you ranges and then search for them using equality conditions (which are good for B-Tree
indexes)
在MySQL
和SQL Server 2008
中,您可以使用SPATIAL
索引(R-Tree
).
In MySQL
and in SQL Server 2008
you could use SPATIAL
indexes (R-Tree
).
它们特别适用于选择记录范围内给定点的所有记录"等条件,这正是您的情况.
They are particularly good for the conditions like "select all records with a given point inside the record's range", which is just your case.
您将 start_date
和 end_date
存储为 LineString
的开头和结尾(将它们转换为 UNIX
另一个数值的时间戳),用 SPATIAL
索引索引它们并搜索最小边界框 (MBR
) 包含的所有此类 LineString
有问题的日期值,使用 MBRContains
.
You store the start_date
and end_date
as the beginning and the end of a LineString
(converting them to UNIX
timestamps of another numeric value), index them with a SPATIAL
index and search for all such LineString
s whose minimum bounding box (MBR
) contains the date value in question, using MBRContains
.
请参阅我的博客中有关如何在 MySQL
中执行此操作的条目:
See this entry in my blog on how to do this in MySQL
:
以及 SQL Server
的简要性能概述:
and a brief performance overview for SQL Server
:
同样的解决方案可用于针对存储在数据库中的网络范围搜索给定的IP
.
Same solution can be applied for searching a given IP
against network ranges stored in the database.
此任务与您的查询一起是此类条件的另一个常用示例.
This task, along with you query, is another often used example of such a condition.
如果范围可以重叠,普通B-Tree
索引就不好.
Plain B-Tree
indexes are not good if the ranges can overlap.
如果他们不能(并且您知道),您可以使用 @AlexKuznetsov
If they cannot (and you know it), you can use the brilliant solution proposed by @AlexKuznetsov
另请注意,此查询性能完全取决于您的数据分布.
Also note that this query performance totally depends on your data distribution.
如果B
中有很多记录,而A
中有很少的记录,你可以在B.dates
上建立一个索引,然后让A
上的 TS/CIS
去.
If you have lots of records in B
and few records in A
, you could just build an index on B.dates
and let the TS/CIS
on A
go.
此查询将始终从 A
读取所有行,并将在嵌套循环中对 B.dates
使用 Index Seek
.
This query will always read all rows from A
and will use Index Seek
on B.dates
in a nested loop.
如果您的数据以其他方式分发,i.e.A
中有很多行,而 B
中的行很少,而且范围通常很短,那么您可以稍微重新设计一下表格:
If your data are distributed other way round, i. e. you have lots of rows in A
but few in B
, and the ranges are generally short, then you could redesign your tables a little:
A
start_date interval_length
,在A(interval_length, start_date)
并使用此查询:
SELECT *
FROM (
SELECT DISTINCT interval_length
FROM a
) ai
CROSS JOIN
b
JOIN a
ON a.interval_length = ai.interval_length
AND a.start_date BETWEEN b.date - ai.interval_length AND b.date
这篇关于给定日期范围的此查询的最快方法(最佳策略是什么)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!