在 T-SQL 中设置和查询版本记录的最佳实践 [英] Best practice for setup and querying versioned records in T-SQL
问题描述
我正在尝试优化我的 SQL 查询,我总是回到这个问题,我希望深入了解如何最好地优化它.
I'm trying to optimize my SQL queries and I always come back to this one issue and I was hoping to get some insight into how I could best optimize this.
为简洁起见,假设我有一个简单的员工表:
For brevity, lets say I have a simple employee table:
tbl_employees
Id HiredDateTime
------------------
1 ...
2 ...
在另一个表中为每个员工提供版本信息:
That has versioned information in another another table for each employee:
tbl_emplyees_versioned
Id Version Name HourlyWage
-------------------------------
1 1 Bob 10
1 2 Bob 20
1 3 Bob 30
2 1 Dan 10
2 2 Dan 20
这是在视图中检索最新版本记录的方式:
And this is how the latest version records are retrieved in a View:
Select tbl_employees.Id, employees_LatestVersion.Name, employees_LatestVersion.HourlyWage, employees_LatestVersion.Version
From tbl_employees
Inner Join tbl_employees_versioned
ON tbl_employees.Id = tbl_employees_versioned.Id
CROSS APPLY
(SELECT Id, Max(Version) AS Version
FROM tbl_employees_versioned AS employees_LatestVersion
WHERE Id = tbl_employees_versioned.Id
GROUP BY Id) AS employees_LatestVersion
要得到这样的回复:
Id Version Name HourlyWage
-------------------------------
1 3 Bob 30
2 2 Dan 20
当提取包含超过 500 条员工记录且每个记录有几个版本的查询时,此查询开始卡住并需要几秒钟才能运行.
When pulling a query that has over 500 employees records for which each have a couple few versions, this query starts choking up and takes a few seconds to run.
马上就遇到了几次打击,但我不知道如何克服它们.
There are a couple strikes right off the bat, but I'm not sure how to overcome them.
显然交叉应用会增加一些性能损失.处理这样的版本信息时是否有最佳实践?有没有更好的方法来获得最高版本的记录?
Obviously the Cross Apply adds some performance loss. Is there a best practice when dealing with versioned information like this? Is there a better way to get just a record with the highest version?
版本化表没有聚集索引,因为 Id 或 Version 都不是唯一的.将它们连接在一起,但它不会那样工作.相反,Id 有一个非聚集索引,Version 有另一个索引.有没有更好的方法来索引这个表以获得任何性能增益?索引视图真的有用吗?
The versioned table doesn't have a clustered index beause neither Id or Version are unique. Concatenated together they would be, but it doesn't work like that. Instead there is a non-clustered index for Id and another one for Version. Is there a better way to index this table to get any performance gain? Would an indexed view really help here?
推荐答案
我认为构建数据的最佳方式是使用开始日期和结束日期.因此,原始表的数据结构如下所示:
I think the best way to structure the data is using start dates and end dates. So, the data structure for your original table would look like:
create table tbl_EmployeesHistory (
EmployeeHistoryId int,
EffDate date not null,
EndDate date,
-- Fields that describe the employee during this time
)
然后,您可以使用视图查看当前版本:
Then, you can see the current version using a view:
create view vw_Employees as
select *
from tbl_EmployeesHistory
where EndDate is NULL
在某些情况下,如果允许未来结束日期,则 where 子句将是:
In some cases, where future end dates are allowed, the where clause would be:
where coalesce(EndDate, getdate()) >= getdate()
或者,在这种情况下,您可以将 EndDate 设置为未来某个很远很远的日期,例如01-o1-9999".您可以将其添加为 create table 语句中的默认值,使该列不为 null,然后您就可以始终使用该语句:
Alternatively, in this case, you can default EndDate to some future date far, far away such as '01-o1-9999'. You would add this as the default in the create table statement, make the column not null, and then you can always use the statement:
where getdate() between EffDate and EndDate
正如 Martin 在他的评论中指出的那样,coalesce() 可能会妨碍索引的使用(在 SQL Server 中是这样),而这没有这个问题.
As Martin points out in his comment, the coalesce() might impede the use of an index (it does in SQL Server), whereas this does not have that problem.
这称为缓慢变化的维度.Ralph Kimball 在他关于数据仓库的书中详细讨论了这个概念.
This is called a slowly changing dimension. Ralph Kimball discusses this concept in some length in his books on data warehousing.
这篇关于在 T-SQL 中设置和查询版本记录的最佳实践的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!