针对 400 万条记录的两个完全不同的查询同时执行 - 一个使用蛮力 [英] Two radically different queries against 4 mil records execute in the same time - one uses brute force

查看:47
本文介绍了针对 400 万条记录的两个完全不同的查询同时执行 - 一个使用蛮力的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是 SQL Server 2008.我有一个包含超过 300 万条记录的表,它与另一个包含一百万条记录的表相关.

I'm using SQL Server 2008. I have a table with over 3 million records, which is related to another table with a million records.

我花了几天时间尝试用不同的方式来查询这些表.我把它归结为两个完全不同的查询,这两个查询都需要 6 秒才能在我的笔记本电脑上执行.

I have spent a few days experimenting with different ways of querying these tables. I have it down to two radically different queries, both of which take 6s to execute on my laptop.

第一个查询使用蛮力方法评估可能匹配,并通过聚合求和计算删除不正确的匹配.

The first query uses a brute force method of evaluating possibly likely matches, and removes incorrect matches via aggregate summation calculations.

第二个获取所有可能的匹配项,然后通过 EXCEPT 查询删除不正确的匹配项,该查询使用两个专用索引来查找低和高不匹配.

The second gets all possibly likely matches, then removes incorrect matches via an EXCEPT query that uses two dedicated indexes to find the low and high mismatches.

从逻辑上讲,人们期望蛮力是缓慢的,而索引是快速的.不是这样.我对索引进行了大量试验,直到获得最佳速度.

Logically, one would expect the brute force to be slow and the indexes one to be fast. Not so. And I have experimented heavily with indexes until I got the best speed.

此外,蛮力查询不需要那么多索引,这意味着从技术上讲,它会产生更好的整体系统性能.

Further, the brute force query doesn't require as many indexes, which means that technically it would yield better overall system performance.

下面是两个执行计划.如果您看不到它们,请告诉我,然后我会以横向方式重新发布/邮寄给您.

Below are the two execution plans. If you can't see them, please let me know and I'll re-post then in landscape orientation / mail them to you.

蛮力查询:

SELECT      ProductID, [Rank]
FROM        (
            SELECT      p.ProductID, ptr.[Rank], SUM(CASE
                            WHEN p.ParamLo < si.LowMin OR
                            p.ParamHi > si.HiMax THEN 1
                            ELSE 0
                            END) AS Fail
            FROM        dbo.SearchItemsGet(@SearchID, NULL) AS si
                        JOIN dbo.ProductDefs AS pd
            ON          pd.ParamTypeID = si.ParamTypeID
                        JOIN dbo.Params AS p
            ON          p.ProductDefID = pd.ProductDefID
                        JOIN dbo.ProductTypesResultsGet(@SearchID) AS ptr
            ON          ptr.ProductTypeID = pd.ProductTypeID
            WHERE       si.Mode IN (1, 2)
            GROUP BY    p.ProductID, ptr.[Rank]
            ) AS t
WHERE       t.Fail = 0

基于索引的异常查询:

with si AS (
    SELECT      DISTINCT pd.ProductDefID, si.LowMin, si.HiMax
    FROM        dbo.SearchItemsGet(@SearchID, NULL) AS si
                JOIN dbo.ProductDefs AS pd
    ON          pd.ParamTypeID = si.ParamTypeID
                JOIN dbo.ProductTypesResultsGet(@SearchID) AS ptr
    ON          ptr.ProductTypeID = pd.ProductTypeID
    WHERE       si.Mode IN (1, 2)
)
SELECT      p.ProductID
FROM        dbo.Params AS p
            JOIN si
ON          si.ProductDefID = p.ProductDefID
EXCEPT
SELECT      p.ProductID
FROM        dbo.Params AS p
            JOIN si
ON          si.ProductDefID = p.ProductDefID    
WHERE       p.ParamLo < si.LowMin OR p.ParamHi > si.HiMax

我的问题是,根据执行计划,哪个看起来更有效率?我意识到随着我的数据增长,情况可能会发生变化.

My question is, based on the execution plans, which one look more efficient? I realize that thing may change as my data grows.

我已经更新了索引,现在第二个查询的执行计划如下:

I have updated the indexes, and now have the following execution plan for the second query:

推荐答案

感谢大家的投入和帮助.

Thank you all for your input and help.

通过阅读您所写的内容、进行实验并深入研究执行计划,我发现答案是临界点.

From reading what you wrote, experimenting, and digging into the execution plan, I discovered the answer is tipping point.

返回的记录太多,无法保证使用索引.

There were too many records being returned to warrant use of the index.

请参阅此处(金伯利·特里普).

See here (Kimberly Tripp).

这篇关于针对 400 万条记录的两个完全不同的查询同时执行 - 一个使用蛮力的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆