SQL 字符串比较速度“like"与“patindex" [英] SQL String comparison speed 'like' vs 'patindex'

查看:68
本文介绍了SQL 字符串比较速度“like"与“patindex"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个查询如下(简化)...

I had a query as follows (simplified)...

SELECT     *
FROM       table1 AS a
INNER JOIN table2 AS b ON (a.name LIKE '%' + b.name + '%')

对于我的数据集,执行时间大约为 90 秒,因此我一直在寻找加快速度的方法.无缘无故,我想我会尝试 PATINDEX 而不是 LIKE...

For my dataset this was taking around 90 seconds to execute, so I have been looking for ways of speeding it up. For no good reason, I thought I'd try PATINDEX instead of LIKE...

SELECT     *
FROM       table1 AS a
INNER JOIN table2 AS b ON (PATINDEX('%' + b.name + '%', a.name) > 0)

在同一数据集上,这会在眨眼间执行并返回相同的结果.

On the same dataset this executes in the blink of an eye and returns the same results.

谁能解释为什么 LIKE 比 PATINDEX 慢这么多?鉴于 LIKE 只是返回一个 BOOLEAN 而 PATINDEX 正在返回实际位置,我本来希望后者会更慢,或者只是这两个函数的编写效率问题?

Can anyone explain why LIKE is so much slower than PATINDEX? Given that LIKE is just returning a BOOLEAN whereas PATINDEX is returning the actual location I would have expected the latter to be slower if anything, or is it simply a matter of how efficiently the two functions have been written?

好的,这是每个查询的完整内容,然后是其执行计划.#StakeholderNames"只是我匹配的可能名称的临时表.

Ok, here is each query in full, followed by its execution plan. "#StakeholderNames" is just a temp table of likely names which I am matching against.

我已经撤回了实时数据并多次运行每个查询.第一个大约需要 17 秒(比实时数据库上的原始 90 秒略短),第二个不到 1 秒...

I have pulled back the live data and run each query several times. The first is taking about 17 seconds (so somewhat less than the original 90 seconds on the live database) and the second under 1 second...

SELECT              sh.StakeholderID,
                    sh.HoldingID,
                    i.AgencyCommissionImportID,
                    1

    FROM            AgencyCommissionImport AS i
    INNER JOIN      #StakeholderNames AS sn ON REPLACE(REPLACE(i.ClientName,' ',''), ',','') LIKE '%' + sn.Name + '%'
    INNER JOIN      Holding AS h ON (h.ProviderName = i.Provider) AND (h.HoldingReference = i.PlanNumber)
    INNER JOIN      StakeholderHolding AS sh ON (sn.StakeholderID = sh.StakeholderID) AND (h.HoldingID = sh.HoldingID)
    WHERE           i.AgencyCommissionFileID = @AgencyCommissionFileID
                AND (i.MatchTypeID = 0)
                AND ((i.MatchedHoldingID IS NULL)
                    OR (i.MatchedStakeholderID IS NULL))

   |--Table Insert(OBJECT:([tempdb].[dbo].[#Results]), SET:([#Results].[StakeholderID] = [AttivoGroup_copy].[dbo].[StakeholderHolding].[StakeholderID] as [sh].[StakeholderID],[#Results].[HoldingID] = [AttivoGroup_copy].[dbo].[StakeholderHolding].[HoldingID] as [sh].[HoldingID],[#Results].[AgencyCommissionImportID] = [AttivoGroup_copy].[dbo].[AgencyCommissionImport].[AgencyCommissionImportID] as [i].[AgencyCommissionImportID],[#Results].[MatchTypeID] = [Expr1014],[#Results].[indx] = [Expr1013]))
        |--Compute Scalar(DEFINE:([Expr1014]=(1)))
             |--Compute Scalar(DEFINE:([Expr1013]=getidentity((1835869607),(2),N'#Results')))
                  |--Top(ROWCOUNT est 0)
                       |--Hash Match(Inner Join, HASH:([h].[ProviderName], [h].[HoldingReference])=([i].[Provider], [i].[PlanNumber]), RESIDUAL:([AttivoGroup_copy].[dbo].[Holding].[ProviderName] as [h].[ProviderName]=[AttivoGroup_copy].[dbo].[AgencyCommissionImport].[Provider] as [i].[Provider] AND [AttivoGroup_copy].[dbo].[Holding].[HoldingReference] as [h].[HoldingReference]=[AttivoGroup_copy].[dbo].[AgencyCommissionImport].[PlanNumber] as [i].[PlanNumber] AND [Expr1015] like [Expr1016]))
                            |--Nested Loops(Inner Join, OUTER REFERENCES:([sh].[HoldingID]))
                            |    |--Nested Loops(Inner Join, OUTER REFERENCES:([sn].[StakeholderID]))
                            |    |    |--Compute Scalar(DEFINE:([Expr1016]=('%'+#StakeholderNames.[Name] as [sn].[Name])+'%', [Expr1017]=LikeRangeStart(('%'+#StakeholderNames.[Name] as [sn].[Name])+'%'), [Expr1018]=LikeRangeEnd(('%'+#StakeholderNames.[Name] as [sn].[Name])+'%'), [Expr1019]=LikeRangeInfo(('%'+#StakeholderNames.[Name] as [sn].[Name])+'%')))
                            |    |    |    |--Table Scan(OBJECT:([tempdb].[dbo].[#StakeholderNames] AS [sn]))
                            |    |    |--Clustered Index Seek(OBJECT:([AttivoGroup_copy].[dbo].[StakeholderHolding].[PK_StakeholderHolding] AS [sh]), SEEK:([sh].[StakeholderID]=#StakeholderNames.[StakeholderID] as [sn].[StakeholderID]) ORDERED FORWARD)
                            |    |--Clustered Index Seek(OBJECT:([AttivoGroup_copy].[dbo].[Holding].[PK_Holding] AS [h]), SEEK:([h].[HoldingID]=[AttivoGroup_copy].[dbo].[StakeholderHolding].[HoldingID] as [sh].[HoldingID]) ORDERED FORWARD)
                            |--Compute Scalar(DEFINE:([Expr1015]=replace(replace([AttivoGroup_copy].[dbo].[AgencyCommissionImport].[ClientName] as [i].[ClientName],' ',''),',','')))
                                 |--Clustered Index Scan(OBJECT:([AttivoGroup_copy].[dbo].[AgencyCommissionImport].[PK_AgencyCommissionImport] AS [i]), WHERE:([AttivoGroup_copy].[dbo].[AgencyCommissionImport].[AgencyCommissionFileID] as [i].[AgencyCommissionFileID]=[@AgencyCommissionFileID] AND [AttivoGroup_copy].[dbo].[AgencyCommissionImport].[MatchTypeID] as [i].[MatchTypeID]=(0) AND ([AttivoGroup_copy].[dbo].[AgencyCommissionImport].[MatchedHoldingID] as [i].[MatchedHoldingID] IS NULL OR [AttivoGroup_copy].[dbo].[AgencyCommissionImport].[MatchedStakeholderID] as [i].[MatchedStakeholderID] IS NULL)))


SELECT              sh.StakeholderID,
                    sh.HoldingID,
                    i.AgencyCommissionImportID,
                    1

    FROM            AgencyCommissionImport AS i
    INNER JOIN      #StakeholderNames AS sn ON (PATINDEX('%' + sn.Name + '%', REPLACE(REPLACE(i.ClientName,' ',''), ',','')) > 0)
    INNER JOIN      Holding AS h ON (h.ProviderName = i.Provider) AND (h.HoldingReference = i.PlanNumber)
    INNER JOIN      StakeholderHolding AS sh ON (sn.StakeholderID = sh.StakeholderID) AND (h.HoldingID = sh.HoldingID)
    WHERE           i.AgencyCommissionFileID = @AgencyCommissionFileID
                AND (i.MatchTypeID = 0)
                AND ((i.MatchedHoldingID IS NULL)
                    OR (i.MatchedStakeholderID IS NULL))

   |--Table Insert(OBJECT:([tempdb].[dbo].[#Results]), SET:([#Results].[StakeholderID] = [AttivoGroup_copy].[dbo].[StakeholderHolding].[StakeholderID] as [sh].[StakeholderID],[#Results].[HoldingID] = [AttivoGroup_copy].[dbo].[StakeholderHolding].[HoldingID] as [sh].[HoldingID],[#Results].[AgencyCommissionImportID] = [AttivoGroup_copy].[dbo].[AgencyCommissionImport].[AgencyCommissionImportID] as [i].[AgencyCommissionImportID],[#Results].[MatchTypeID] = [Expr1014],[#Results].[indx] = [Expr1013]))
        |--Compute Scalar(DEFINE:([Expr1014]=(1)))
             |--Compute Scalar(DEFINE:([Expr1013]=getidentity((1867869721),(2),N'#Results')))
                  |--Top(ROWCOUNT est 0)
                       |--Hash Match(Inner Join, HASH:([h].[ProviderName], [h].[HoldingReference])=([i].[Provider], [i].[PlanNumber]), RESIDUAL:([AttivoGroup_copy].[dbo].[Holding].[ProviderName] as [h].[ProviderName]=[AttivoGroup_copy].[dbo].[AgencyCommissionImport].[Provider] as [i].[Provider] AND [AttivoGroup_copy].[dbo].[Holding].[HoldingReference] as [h].[HoldingReference]=[AttivoGroup_copy].[dbo].[AgencyCommissionImport].[PlanNumber] as [i].[PlanNumber] AND patindex([Expr1015],[Expr1016])>(0)))
                            |--Nested Loops(Inner Join, OUTER REFERENCES:([sh].[HoldingID]))
                            |    |--Nested Loops(Inner Join, OUTER REFERENCES:([sn].[StakeholderID]))
                            |    |    |--Compute Scalar(DEFINE:([Expr1015]=('%'+#StakeholderNames.[Name] as [sn].[Name])+'%'))
                            |    |    |    |--Table Scan(OBJECT:([tempdb].[dbo].[#StakeholderNames] AS [sn]))
                            |    |    |--Clustered Index Seek(OBJECT:([AttivoGroup_copy].[dbo].[StakeholderHolding].[PK_StakeholderHolding] AS [sh]), SEEK:([sh].[StakeholderID]=#StakeholderNames.[StakeholderID] as [sn].[StakeholderID]) ORDERED FORWARD)
                            |    |--Clustered Index Seek(OBJECT:([AttivoGroup_copy].[dbo].[Holding].[PK_Holding] AS [h]), SEEK:([h].[HoldingID]=[AttivoGroup_copy].[dbo].[StakeholderHolding].[HoldingID] as [sh].[HoldingID]) ORDERED FORWARD)
                            |--Compute Scalar(DEFINE:([Expr1016]=replace(replace([AttivoGroup_copy].[dbo].[AgencyCommissionImport].[ClientName] as [i].[ClientName],' ',''),',','')))
                                 |--Clustered Index Scan(OBJECT:([AttivoGroup_copy].[dbo].[AgencyCommissionImport].[PK_AgencyCommissionImport] AS [i]), WHERE:([AttivoGroup_copy].[dbo].[AgencyCommissionImport].[AgencyCommissionFileID] as [i].[AgencyCommissionFileID]=[@AgencyCommissionFileID] AND [AttivoGroup_copy].[dbo].[AgencyCommissionImport].[MatchTypeID] as [i].[MatchTypeID]=(0) AND ([AttivoGroup_copy].[dbo].[AgencyCommissionImport].[MatchedHoldingID] as [i].[MatchedHoldingID] IS NULL OR [AttivoGroup_copy].[dbo].[AgencyCommissionImport].[MatchedStakeholderID] as [i].[MatchedStakeholderID] IS NULL)))

推荐答案

这种可重复的性能差异很可能是由于两个查询的执行计划不同造成的.

That kind of repeatable difference in performance is most likely due to a difference in the execution plans for the two queries.

让 SQL Server 在每个查询运行时返回实际的执行计划,并比较执行计划.

Have SQL Server return the actual execution plan when each query is run, and compare the execution plans.

此外,当您比较两个查询的性能时,将每个查询运行两次,并忽略第一次运行的时间.(第一次查询运行可能包括很多繁重的工作(语句解析和数据库 i/o).第二次运行将为您提供比其他查询更有效的运行时间.

Also, run each query twice, and throw out the timing for the first run, when you compare the performance of the two queries. (The first query run may include a lot of heavy lifting (statement parsing and database i/o). The second run will give you an elapsed time that is more validly compared to the other query.

谁能解释为什么 LIKE 比 PATINDEX 慢这么多?

每个查询的执行计划可能会解释差异.

The execution plan for each query will likely explain the difference.

这仅仅是两个函数编写效率的问题吗?

这不是函数编写效率的真正问题.真正重要的是生成的执行计划.重要的是谓词是否可调整以及优化器是否选择使用可用索引.

在我运行的快速测试中,我发现执行计划有所不同.使用连接谓词中的 LIKE 运算符,计划在 "Computer Scalar" 操作之后包括对 table2 的 "Table Spool (Lazy Spool)" 操作.使用 PATINDEX 函数,我在计划中看不到 "Table Spool" 操作.但考虑到查询、表、索引和统计数据的差异,我得到的计划可能与你得到的计划有很大不同.

In the quick test I ran, I see a difference in the execution plans. With the LIKE operator in the join predicate, the plan includes a "Table Spool (Lazy Spool)" operation on table2 after the "Computer Scalar" operation. With the PATINDEX function, I don't see a "Table Spool" operation in the plan. But the plans I'm getting may be significantly different than the plans you get, given differences in the queries, tables, indexes and statistics.

我在两个查询的执行计划输出中看到的唯一区别(表达式占位符名称除外)是对三个内部函数的调用(LikeRangeStartLikeRangeEnd, 和 LikeRangeInfo 代替对 PATINDEX 函数的一次调用.这些函数似乎为结果集中的每一行调用,结果表达式用于扫描嵌套循环中的内表.

The only difference I see in the execution plan output for the two queries (aside from expression placeholder names) is the calls to the three internal functions (LikeRangeStart, LikeRangeEnd, and LikeRangeInfo in place of one call to the PATINDEX function. These functions appear to be called for each row in a result set, and the resulting expression are used for scan of the inner table in a nested loop.

因此,对于 LIKE 运算符的三个函数调用看起来似乎比对 PATINDEX 函数的单个调用更昂贵(在时间方面).(解释计划显示为嵌套循环连接的外部结果集中的每一行调用这些函数;对于大量行,即使经过时间的微小差异也可以乘以足够多的倍以表现出显着的性能差异.)

So, it does look as if the three function calls for the LIKE operator could be more expensive (elapsed time wise) than the single call to the PATINDEX function. (The explain plan shows those functions being called for each row in the outer resultset of a nested loop join; for a large number of rows, even a slight difference in the elapsed time could be multiplied enough times to exhibit a significant performance difference.)

在我的系统上运行了一些测试用例后,我仍然对你看到的结果感到困惑.

After running some test cases on my system, I'm still baffled at the results you are seeing.

可能是对 PATINDEX 函数的调用与对三个内部函数(LikeRangeStart、LikeRangeEnd、LikeRangeInfo)的调用的性能问题.

Maybe it is an issue with the performance of the calls to the PATINDEX function vs. the calls to the three internal functions (LikeRangeStart, LikeRangeEnd, LikeRangeInfo.)

在足够大"的结果集上执行这些操作时,所用时间的微小差异可能会乘以显着差异.

It's possible that with those performed on a "large" enough result set, a small difference in elapsed time could be multiplied into a significant difference.

但实际上,我发现使用 LIKE 运算符的查询比使用 PATINDEX 函数的等效查询执行时间要长得多,这有点令人惊讶.

But I actually find it to be somewhat surprising that a query using the LIKE operator would take significantly longer to execute than an equivalent query using the PATINDEX function.

这篇关于SQL 字符串比较速度“like"与“patindex"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆