比较运算符对查询执行持续时间的巨大影响 [英] Ridiculously huge impact of comparison operator on query execution duration

查看:20
本文介绍了比较运算符对查询执行持续时间的巨大影响的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我定义了以下视图:

dsplit_base - 4 个查询的联合,每个查询都是事实表和映射表之间的简单连接(包含调用统计信息);它由 201 列组成

dsplit_base - a union of 4 queries each of which is a simple join between fact and mapping tables (contains call statistics); it consists of 201 columns

calls_check - 派生自 dsplit_base 的视图,旨在用于数据一致性检查.定义如下:

calls_check - a view derived from the dsplit_base meant to be used in data consistency check. Here is the definition:

SELECT a.Brand,
       a.[Call Center] ,
       c.date,
       c.weekday,
       COUNT(*) vol,
       cast((COUNT(*)-g.vol) AS real)/g.vol*100 vol_diff ,
       SUM(abncalls+acdcalls) calls ,
       CASE
           WHEN g.calls<>0 THEN cast((SUM(abncalls+acdcalls)-g.calls) AS real)/g.calls*100
           ELSE CASE
                    WHEN SUM(abncalls+acdcalls)<>0 THEN 100
                    ELSE 0
                END
       END calls_diff
FROM dsplit_base a
JOIN calendar c ON a.ROW_DATE=c.date
JOIN
  ( SELECT t.Brand,
           t.[Call Center],
           c.weekday,
           avg(cast(vol AS bigint)) vol,
           AVG(cast(calls AS bigint)) calls
   FROM
     ( SELECT Brand,
              [Call Center], row_date, COUNT(*) vol, SUM(abncalls+acdcalls) calls from dsplit_base group by ROW_DATE, [Call Center],
              Brand ) t
   JOIN calendar c ON t.row_date=c.date
   GROUP BY c.weekday,
            t.[Call Center],
            t.Brand) g ON c.weekday=g.weekday
AND a.Brand=g.Brand
AND a.[Call Center]=g.[Call Center]
GROUP BY c.date,
         c.weekday,
         g.vol,
         g.calls,
         a.[Call Center],
         a.Brand

以下查询在 1-3 秒内产生大约 16000 行:

The following query yields around 16000 rows in 1-3 seconds:

    select * from calls_check

Brand   Call Center date    weekday     vol vol_diff    calls   calls_diff
LMN Munich      2008-01-24  Thursday    3   -25     470 8.796296
LMN Munich      2008-04-26  Saturday    3   0       352 51.72414
...

现在我遇到的实际问题是当我试图在有限的时间内拉出结果.通过添加如下 where 子句,查询将不会完成(肯定不会在大约 10 分钟内完成):

Now the actual problem I encountered is when I tried to pull out results for limited period of time. By adding where clause as follows the query will not finish (surely not in ~10 minutes):

    select * from calls_check
    where date >= DATEADD(d, -8, sysdatetime())

而且,更奇怪的是,这个查询在一秒钟内成功执行!

And, what is maybe even weirder, this query executes successfully in a second!

    select * from calls_check
    where date < DATEADD(d, -8, sysdatetime())

谁能告诉为什么 where 子句中的比较运算符会产生如此大的不同?为什么 < 似乎非常有效地对结果集进行切片,而 = 使查询无响应?

Can anybody tell why comparison operator in where clause makes such a difference? Why < seems to very efficiently slice the result set while > or = makes the query unresponsive?

dsplit_base 视图由 4 个表联合(带连接)组成.这是他们的行数:

The dsplit_base view consists of 4 tables union (with joins). Here are their row counts:

dsplit_DE - 2521

dsplit_DE - 2521

dsplit_WNS - 7243

dsplit_WNS - 7243

dsplit_US - 121451

dsplit_US - 121451

合作伙伴 - 377841 (166043)

partners - 377841 (166043)

'partners' 表的实际行数是 166043,因为在视图中它需要在这种情况下的行:

actual 'partners' table row count is 166043 because in the view it takes rows on this condition:

from partners p join splitdim s 
ON p.[Skill Name]=s.SPLITNAME and (p.Date>=s.[start_date] or s.[start_date] is null) and (p.DATE<=s.[end_date] or s.[end_date] is null)
where s.[Call center] IN ('Sitel', 'TRX', 'Sellbytel') 
OR (s.[Call center]='WNS' and p.Date<(select MIN(row_Date) from dsplit_WNS))
OR (s.[Call Center]='Munich' and (p.Date<'2012-06-29' or p.Date between '2012-08-01' and '2012-08-27'))

我尝试修改视图定义并发现:

I experimented with modified view definition and found out that:

使用 dsplit_DE 和/或 dsplit_WNS 的视图只有两个查询的工作速度都非常快(1-2 秒)

having the view with dsplit_DE and/or dsplit_WNS only both queries work pretty fast (1-2 seconds)

只有合作伙伴的 '>=' 查询需要大约 30 秒;使用 dsplit_US 只需要大约 60 秒

with partners only the '>=' query took ~30s ; with dsplit_US only it took ~60s

这里是后者的实际执行计划EXEC PLAN

here is the actual execution plan of the latter EXEC PLAN

最后两个表比其他表大得多,但有几十万条记录,应该不会花这么长时间.什么导致执行时间的差异取决于 '<'还是 where 子句中使用的>"运算符?

The last two table are much bigger than others yet with a few hundred thousands of records it should not take so long. What causes the difference in execution time depenending on '<' or '>' operator used in where clause?

推荐答案

据我所知 <>= 是不同的例如如果 x <4 的 X 值是小于 4 或 3,2,1... 同时 如果 x >=4 的 X 值是 4,5,6....意思是说它们产生不同的结果,也许背后的原因是

as far as i know < and >= are different For example What value of X if x <4 it would be Less than 4 or 3,2,1... meanwhile What value of X if x >=4 it would be 4,5,6..... meaning to say they yield different result, and maybe the reason behind is that

     select * from calls_check
where date < DATEADD(d, -8, sysdatetime())

只检索最少的数据

虽然

 select * from calls_check
    where date >= DATEADD(d, -8, sysdatetime())

检索大量数据,这就是为什么需要大量时间来检索数据

Retrieves a lot of data that's why it needs a lot of time to retrieve it

而且你在比较日期时需要非常小心,应该都在 datetime 中而不是 string 它会导致错误并且不会产生预期的结果

And also you need to be very careful when comparing dates, the should be both in datetime not in string it will cause bugs and it will not yield the expected result

希望这有助于谢谢:)

这篇关于比较运算符对查询执行持续时间的巨大影响的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆