针对每个N的最新记录的最佳执行查询 [英] Optimal performing query for latest record for each N

查看：77 发布时间：2020/10/16 4:58:54 sql-server performance tsql greatest-n-per-group database-performance

本文介绍了针对每个N的最新记录的最佳执行查询的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是我发现的情况。

我有一个相当大的表，需要从中查询最新记录。这是查询基本列的创建：

 创建表[dbo]。[ChannelValue]（
 [ID] [bigint] IDENTITY（1,1）NOT NULL，
 [UpdateRecord] [bit] NOT NULL，
 [VehicleID] [int] NOT NULL，
 [UnitID] [int ] NOT NULL，
 [RecordInsert] [datetime] NOT NULL，
 [TimeStamp] [datetime] NOT NULL 
）ON [PRIMARY] 
 GO

ID列是主键，并且VehicleID和TimeStamp上存在非聚集索引

 在[dbo]上创建命名索引[IX_ChannelValue_TimeStamp_VehicleID]。[ChannelValue] 
（
 [TimeStamp] ASC，
 [VehicleID] ASC 
）在[主要] 
 GO

我正在处理的表要优化我的查询，它需要超过2300万行，并且只是查询所需要处理的大小的十分之一。

我需要返回最新的行每个VehicleID。

我去过在StackOverflow上对这个问题的回答进行了仔细的研究，我做了很多谷歌搜索，在SQL Server 2005及更高版本上似乎有3或4种常见方法。

到目前为止，我找到的最快的方法是以下查询：

  SELECT cv。* 
 FROM ChannelValue cv 
 WHERE cv.TimeStamp =（
 SELECT 
 MAX（TimeStamp）
 FROM ChannelValue 
 WHERE ChannelValue.VehicleID = cv.VehicleID 
）

使用表中的当前数据量，大约需要6s的时间来执行，这在合理范围内，但是表将包含在实时环境中的数据量，查询开始执行得太慢。

看看执行计划，我担心的是SQL Server在做什么返回行。

我无法发布执行计划图像，因为我的信誉不够高，但是索引扫描正在解析行内的每一行

我尝试用几种不同的方法重写查询，包括使用如下SQL 2005 Partition方法：

 与cte 
 AS（
 SELECT *，
 ROW_NUMBER（）OVER（PARTITION by VehicleID ORDER BY TimeStamp DESC ）AS seq 
 FROM ChannelValue 
）
 
 SELECT 
 VehicleID，
 TimeStamp，
 Col1 
 FROM cte 
 seq = 1

但是该查询的性能甚至差很多。 p>

我尝试过像这样重新构造查询，但是结果速度和查询执行计划几乎相同：

  SELECT cv。* 
 FROM（
 SELECT VehicleID 
，MAX（TimeStamp）AS [TimeStamp] 
 FROM ChannelValue 
 GROUP依车辆编号
）AS [ q] 
内联ChannelValue cv 
 on cv.VehicleID = q.VehicleID 
 AND cv.TimeStamp = q.TimeStamp

我在表结构上有一些灵活性（尽管在一定程度上），因此我可以向数据库添加索引，索引视图等，甚至附加表。

在此，我将不胜感激。

Edit 链接到执行计划图像。

解决方案

取决于您的数据（每个组有多少行？）和索引。

请参见优化TOP每组查询的N个查询，用于3种方法的性能比较。

在您的情况下，只有少量的车辆有数百万行，我会添加一个索引在 VehicleID，时间戳并执行

 选择CA。* 
 FROM车辆V 
交叉应用（选择前1 * 
 FROM ChannelValue CV 
 CV.VehicleID = V.VehicleID 
按时间戳记DESC排序）CA

Here is the scenario I find myself in.

I have a reasonably big table that I need to query the latest records from. Here is the create for the essential columns for the query:

CREATE TABLE [dbo].[ChannelValue](
   [ID] [bigint] IDENTITY(1,1) NOT NULL,
   [UpdateRecord] [bit] NOT NULL,
   [VehicleID] [int] NOT NULL,
   [UnitID] [int] NOT NULL,
   [RecordInsert] [datetime] NOT NULL,
   [TimeStamp] [datetime] NOT NULL
   ) ON [PRIMARY]
GO

The ID column is a Primary Key and there is a non-Clustered index on VehicleID and TimeStamp

CREATE NONCLUSTERED INDEX [IX_ChannelValue_TimeStamp_VehicleID] ON [dbo].[ChannelValue] 
(
    [TimeStamp] ASC,
    [VehicleID] ASC
)ON [PRIMARY]
GO

The table I'm working on to optimise my query is a little over 23 million rows and is only a 10th of the sizes the query needs to operate against.

I need to return the latest row for each VehicleID.

I've been looking through the responses to this question here on StackOverflow and I've done a fair bit of Googling and there seem to be 3 or 4 common ways of doing this on SQL Server 2005 and upwards.

So far the fastest method I've found is the following query:

SELECT cv.*
FROM ChannelValue cv
WHERE cv.TimeStamp = (
SELECT
    MAX(TimeStamp)
FROM ChannelValue
WHERE ChannelValue.VehicleID = cv.VehicleID
)

With the current amount of data in the table it takes about 6s to execute which is within reasonable limits but with the amount of data the table will contain in the live environment the query begins to perform too slow.

Looking at the execution plan my concern is around what SQL Server is doing to return the rows.

I cannot post the execution plan image because my Reputation isn't high enough but the index scan is parsing every single row within the table which is slowing the query down so much.

I've tried rewriting the query with several different methods including using the SQL 2005 Partition method like this:

WITH cte
AS (
    SELECT *,
    ROW_NUMBER() OVER(PARTITION BY VehicleID ORDER BY TimeStamp DESC) AS seq
     FROM ChannelValue
)

SELECT
   VehicleID,
   TimeStamp,
   Col1
FROM cte
WHERE seq = 1

But the performance of that query is even worse by quite a large magnitude.

I've tried re-structuring the query like this but the result speed and query execution plan is nearly identical:

SELECT cv.*
FROM (
   SELECT VehicleID
    ,MAX(TimeStamp) AS [TimeStamp]
   FROM ChannelValue
   GROUP BY VehicleID
) AS [q]
INNER JOIN ChannelValue cv
   ON cv.VehicleID = q.VehicleID
   AND cv.TimeStamp = q.TimeStamp

I have some flexibility available to me around the table structure (although to a limited degree) so I can add indexes, indexed views and so forth or even additional tables to the database.

I would greatly appreciate any help at all here.

Edit Added the link to the execution plan image.

解决方案

Depends on your data (how many rows are there per group?) and your indexes.

See Optimizing TOP N Per Group Queries for some performance comparisons of 3 approaches.

In your case with millions of rows for only a small number of Vehicles I would add an index on VehicleID, Timestamp and do

SELECT CA.*
FROM   Vehicles V
       CROSS APPLY (SELECT TOP 1 *
                    FROM   ChannelValue CV
                    WHERE  CV.VehicleID = V.VehicleID
                    ORDER  BY TimeStamp DESC) CA

这篇关于针对每个N的最新记录的最佳执行查询的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

针对每个N的最新记录的最佳执行查询 [英] Optimal performing query for latest record for each N

问题描述

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

针对每个N的最新记录的最佳执行查询 [英] Optimal performing query for latest record for each N

问题描述

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭