Sql 的最后 4 个空格会损害 Sql Server 的性能吗? [英] Last 4 Whitespaces of Sql Would Hurt Sql Server Performance?

查看:66
本文介绍了Sql 的最后 4 个空格会损害 Sql Server 的性能吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了一个奇怪的问题.也许这里有人可以帮助我.

I have run a weird problem. Maybe someone here can help me.

我的 SQL Server 版本是 2008 R2.它在具有 24 个内核的服务器上运行.

My SQL Server version is 2008 R2. It runs at a server with 24 cores.

我有 2 个查询字符串:

I have 2 query strings:

String SQL_1 = "select 
                   t.testconfig_id, t.minuteSequence, t.location_id, 
                   sum(t.vuPerNode) as totalVu, 
                   sum(t.backOffPctSum) / sum(t.recordNum) as avgBackOffPct
                from
                   (select
                       p.testconfig_id, p.minuteSequence, r.location_Id, 
                       SUM(p.activeCount) * 1.0 / COUNT(1) as vuPerNode,
                       SUM(p.backOffPct) as backOffPctSum, COUNT(1) as recordNum
                    from
                       loadtest_progress_in_minute p (nolock)
                    join 
                       loadtestRunrecord r (nolock) on p.test_id = r.test_id and p.nodeId = r.nodeId                                               
                    where
                       p.test_id = ? 
                    group by 
                       p.testconfig_id, p.minuteSequence, p.nodeId, r.location_id) t
                group by
                   t.testconfig_id, t.minuteSequence, t.location_id
                order by
                   t.testconfig_id, t.minuteSequence, t.location_id  option (maxdop 23)"

String SQL-2 = "select 
                   t.testconfig_id, t.minuteSequence, t.location_id, 
                   sum(t.vuPerNode) as totalVu, 
                   sum(t.backOffPctSum) / sum(t.recordNum) as avgBackOffPct
                from
                   (select
                       p.testconfig_id, p.minuteSequence, r.location_Id, 
                       SUM(p.activeCount) * 1.0 / COUNT(1) as vuPerNode,
                       SUM(p.backOffPct) as backOffPctSum, COUNT(1) as recordNum
                    from
                       loadtest_progress_in_minute p (nolock)
                    join 
                       loadtestRunrecord r (nolock) on p.test_id = r.test_id and p.nodeId = r.nodeId                                               
                    where
                       p.test_id = ? 
                    group by 
                       p.testconfig_id, p.minuteSequence, p.nodeId, r.location_id) t
                group by
                   t.testconfig_id, t.minuteSequence, t.location_id
                order by
                   t.testconfig_id, t.minuteSequence, t.location_id  option (maxdop 23)       "

这两个查询的唯一区别是 SQL-2 在末尾多了一个制表符.

The only difference between the two queries is that SQL-2 has one extra tab character in the end.

我在同一环境中使用以下代码运行这两个查询:

I run these 2 queries with the following code in the same environment:

PreparedStatemen ps = conn.prepareStatement(SQL_1);
//PreparedStatemen ps = conn.prepareStatement(SQL_2);

ps.setLong(1234);
ps.execute();

我发现有时代码片段中 2 个查询的性能非常不同.

I found the performance of the 2 queries is very different in the code snippet sometimes.

ps.excute() 仅花费大约 10 秒.我看到 SQL_1 运行时 CPU 使用率很高.服务器的24个CPU都用完了.

ps.excute() of SQL_1 in this code snippet only costs about 10 seconds. I see the CPU usage is high when SQL_1 is running. 24 CPU of the server is all utilized.

但是 ps.excute() 在同一代码段中的 SQL_2 花费大约 150 秒.SQL_2 运行时 CPU 使用率低.仅使用了 24 个 CPU 中的 2 个.

But ps.excute() of SQL_2 in the same code snippet costs about 150 seconds. The CPU usage is slow when SQL_2 is running. Only 2 of 24 CPU are utilized.

SQL_1 和 SQL_2 同时运行.

SQL_1 and SQL_2 are running at the same time.

但上述观察并不总是如此.有时 SQL_1 和 SQL_2 的 ps.execute() 性能是相同的.有时 ps.execute() SQL_1 和 SQL_2 的性能就像我上面描述的那样.

But the observation above is not always so. Sometime ps.execute() performance of SQL_1 and SQL_2 are same. Sometime ps.execute() performance of SQL_1 and SQL_2 are as what I described above.

这就是我发现的.这很令人困惑.SQL 的最后一个空格会影响 SQL Server 性能吗?

That's what I found. It's very confusing. Last whitespace of SQL would hurt SQL Server performance?

我认为这不是由自动参数化缓存引起的,如SQL Server 2008 R2 中的空间会降低性能

I think it is not caused by auto-parameterization cache as described in Space in SQL Server 2008 R2 slows down performance

因为我通过在无限循环中长时间调用代码片段得到了上述观察结果.

Because I got the above observation by invoking the code snippet in an infinite loop for long time.

从wireshark我发现Microsoft JDBC会在SQL字符串和参数之间附加2个额外的制表符(1个制表符= 4个空格字符),如下所示:

From wireshark I found Microsoft JDBC would append 2 extra tab(1 tab character = 4 space character) characters between the SQL String and Parameter like following:

[20] [00] [20] [00] [20] [00] [20] [00] [20] [00] [20] [00][20] [00][20] [00]

不知道是不是和我的问题有关.

I don't know whether it is related to my problem.

谢谢.

2012-8-20 更新:

我在 sql-server management studio 中执行了以下 3 个 sql.他们有相同的执行计划.但我有奇怪的发现:

I executed the 3 following sql in sql-server management studio. They have the same execution plan. But I have weird findings:

declare @sql varchar(max);

set @sql='Declare @testId bigint;set @testId = 1234;select p.testconfig_id,        p.minuteSequence,r.location_Id, SUM(p.activeCount) * 1.0 / COUNT(1) as vuPerNode, SUM(p.backOffPct) as backOffPctSum, COUNT(1) as recordNum from loadtest_progress_in_minute p with( nolock,index(idx_loadtest_progress_in_minute_1) ) join loadtestRunrecord r ( nolock ) on p.test_id = r.test_id and p.nodeId = r.nodeId where p.test_id =  @testId group by p.testconfig_id, p.minuteSequence, p.nodeId, r.location_id option (maxdop 23)';
execute (@sql);

这个sql的性能很差.大约需要 90 秒.仅使用 2 个 CPU.

This sql's performance is very bad. It takes about 90 seconds. Only 2 CPU are used.

Declare @sSQL nvarchar(2000);
Declare @paramDefine nvarchar(2000);
Declare @testId bigint;
set @testId = 1234;

set @sSQL = N'                  select              t.testconfig_id, t.minuteSequence, t.location_id, sum(t.vuPerNode) as totalVu,              sum(t.backOffPctSum) / sum(t.recordNum) as avgBackOffPct        from            (           select                  p.testconfig_id, p.minuteSequence, r.location_Id, SUM(p.activeCount) * 1.0 / COUNT(1) as vuPerNode,                 SUM(p.backOffPct) as backOffPctSum, COUNT(1) as recordNum           from                loadtest_progress_in_minute p        ( nolock )                 join                loadtestRunrecord r          ( nolock )                 on p.test_id = r.test_id and p.nodeId = r.nodeId            where               p.test_id = @P0             group by                p.testconfig_id, p.minuteSequence, p.nodeId, r.location_id          ) t                     group by            t.testconfig_id, t.minuteSequence, t.location_id        order by            t.testconfig_id, t.minuteSequence, t.location_id        option (maxdop 23)              ';
set @paramDefine = N'@P0 bigint';
execute sp_executesql @sSQL, @paramDefine, @P0 = @testId;

这个sql在管理工作室只需要10秒左右.CPU全部用完.

This sql takes only about 10 seconds in management studio. All CPU are used.

declare @p1 int
--set @p1=1
exec sp_prepexec @p1 output,N'@P0 bigint',N'                select                t.testconfig_id, t.minuteSequence, t.location_id, sum(t.vuPerNode) as totalVu,            sum(t.backOffPctSum) / sum(t.recordNum) as avgBackOffPct        from            (           select                  p.testconfig_id, p.minuteSequence, r.location_Id, SUM(p.activeCount) * 1.0 / COUNT(1) as vuPerNode,                 SUM(p.backOffPct) as backOffPctSum, COUNT(1) as recordNum           from                loadtest_progress_in_minute p        ( nolock )                 join                loadtestRunrecord r          ( nolock )                 on p.test_id = r.test_id and p.nodeId = r.nodeId            where               p.test_id = @P0             group by                p.testconfig_id, p.minuteSequence, p.nodeId, r.location_id          ) t                     group by            t.testconfig_id, t.minuteSequence, t.location_id        order by            t.testconfig_id, t.minuteSequence, t.location_id        option (maxdop 23)                      ',@P0=1234
select @p1

这个sql性能很快.只需要大约 10 秒钟.所有 CPU 都被利用.

This sql performance is fast. Only takes about 10 seconds. All CPU are utilized.

更新 2012-8-21

直到现在,我还没有得到关于我所发现的最终明确结论.因为windows world 没有打开,所以我们可能永远不会得到SQL Server 内部的细节.在这里,我只对我发现的内容进行解释.一些解释只是我的猜测.我希望它对其他人有帮助.

Until now, I have not get a final clear conclusion about what I have found. Because windows world is not open, maybe we never get the detail inside SQL Server. Here I just give my explaination about what I have found. Some explaination is just my guess. I hope that it's helpful for others.

1) 为什么有时我在 JDBC 中得到两个相似 SQL 的不同性能(这些 SQL 除了最后一个制表符是否存在外是相同的)

1) why I get the different performance for two similiar SQL in JDBC sometimes(These SQL are the same except whether the last tab character is present)

我们的测试不是在孤立的环境中进行的.当我用 JDBC 测试这两个 SQL 时,其他进程也会同时执行最后一个制表符的 SQL.所以我们的测试结果会受到其他过程的影响.

Our test is not in an isolated environment. When I am test the two SQL with JDBC, other processes also excute the SQL with the last tab character is present at the same time. So our test result is influenced by other processes.

表现不同的根本原因是他们选择了不同的执行计划.一个选择了具有良好并行性的执行计划.另一个选择了 parellesim 错误的执行计划.因为所有其他进程都在执行带有制表符的 SQL,所以没有制表符的 SQL 被视为新的 SQL.所以在执行不带制表符的SQL时,会根据记录统计,根据典型参数值生成新的执行计划.可能典型值一开始不擅长表现.但实际访问参数值直方图可以刷新执行计划缓存.没有制表符的 SQL 只在我的测试中使用.我的测试只使用参数值(1234).SQL 服务器认为 (1234) 经常被 SQL 访问,并使用实际最常访问的参数值 (1234) 刷新执行计划.所以性能变好.

The different performance's root cause is that they choose the different excution plan. One chose the exection plan with good parellesim. The other one chose the execution plan with bad parellesim. Because all the other processes are executing SQL with tab character, SQL without tab character is treated as a new SQL. So when SQL without tab character is executed, it generates a new execution plan based on typical parameter value according to records statistics. Maybe the typical value is not good at performance at first time. But actual visited parameter value histogram can flush the execution plan cache. The SQL without tab character is only used in my test. My test only use parameter value (1234). SQL server thinks that (1234) is often visited for the SQL and flush the execution plan with the actual most common visited parameter value (1234). So the performance become good.

当我切换回带有制表符的 SQL 时,SQL Server 将采用较旧的执行计划缓存.此缓存可以由其他正在运行的进程引入并受其他进程的影响.该执行计划也是根据实际最流行的访问参数值生成的.但是这个值会受到其他进程的影响.所以它可能不是 (1234) 并且基于该值的执行计划在性能上对 (1234) 不利.这就是为什么有时带有制表符的 SQL 性能很差的原因.

When I switch back to the SQL with tab character, SQL Server would take a older execution plan cache. This cache can be introduced by other running processes and influenced by others. This execution plan is also generated based on actual most popular visited parameter value. But this value is influenced by other processes. So it may not be (1234) and execution plan based on the value is not good for (1234) at performance. That's why SQL performance with tab character is bad sometimes.

因为有时我的带有制表符的 SQL 测试程序也可以刷新执行计划缓存,如果我的测试运行得足够频繁以更改实际访问的参数值.有时带有制表符的 SQL 的性能也会变得很好.

Because sometimes my test program for SQL with tab character also can flush excution plan cache if my test is running frequently enough to change the actual visited parameter value . The performance of SQL with tab character also can become good sometimes.

2) 为什么 SSMS 下面的 SQL 总是很慢

2) why the following SQL in SSMS is alway slow

declare @sql varchar(max)

set @sql='Declare @testId bigint;set @testId = 36887;select p.testconfig_id, p.minuteSequence,r.location_Id, SUM(p.activeCount) * 1.0 / COUNT(1) as vuPerNode, SUM(p.backOffPct) as backOffPctSum, COUNT(1) as recordNum from loadtest_progress_in_minute p with( nolock,index(idx_loadtest_progress_in_minute_1) ) join loadtestRunrecord r ( nolock ) on p.test_id = r.test_id and p.nodeId = r.nodeId where p.test_id =  @testId group by p.testconfig_id, p.minuteSequence, p.nodeId, r.location_id option (maxdop 23)'
execute (@sql)

因为 SSMS 也会根据记录统计的典型参数值生成执行计划.参数值 (1234) 是非典型值.所以上面的SQL一开始就慢.我猜想 SSMS 中的执行"命令是特殊的,它的缓存不会被实际最常见的访问参数值刷新.所以总是很慢.根据我的实验结果,我猜'sp_prepexec'和'sp_executesql'与'execute'不同,它们的计划缓存可以通过实际最常见的访问参数值刷新,并且与JDBC具有相似的行为.

Because SSMS also generates execution plan based on typical parameter value of records statistics. The parameter value (1234) is atypical value. So the above SQL is slow at the beginning. I guess that 'execute' command in SSMS is special and its cache won't be flushed by actual most common visited parameter value. So it is always slow. According to my experiment results, I guess 'sp_prepexec' and 'sp_executesql' are different from 'execute' and their plan cache can be flushed by actual most common visited parameter value and have a similiar behavior with JDBC.

3) 为什么添加重新编译提示会加快上述 SQL 的性能在回答这个问题之前,我们先来看看以下MSDN在线帮助文​​档中的文字.

3) why adding recompile hint would speed up the above SQL's performance Before answering this problem, let's take a look at the following text from MSDN online help document.

"指示 SQL Server 数据库引擎在执行后放弃为查询生成的计划,强制查询优化器在下次执行相同查询时重新编译查询计划.如果不指定 RECOMPILE,数据库引擎将缓存查询计划并重用它们.编译查询计划时,RECOMPILE 查询提示使用查询中任何局部变量的当前值,如果查询在存储过程中,则将当前值传递给任何参数.

"Instructs the SQL Server Database Engine to discard the plan generated for the query after it executes, forcing the query optimizer to recompile a query plan the next time the same query is executed. Without specifying RECOMPILE, the Database Engine caches query plans and reuses them. When compiling query plans, the RECOMPILE query hint uses the current values of any local variables in the query and, if the query is inside a stored procedure, the current values passed to any parameters.

RECOMPILE 是创建使用 WITH RECOMPILE 子句的存储过程的一种有用的替代方法,当必须重新编译存储过程中的查询子集而不是整个存储过程时.有关详细信息,请参阅重新编译存储过程.RECOMPILE 在您创建计划指南时也很有用."

RECOMPILE is a useful alternative to creating a stored procedure that uses the WITH RECOMPILE clause when only a subset of queries inside the stored procedure, instead of the whole stored procedure, must be recompiled. For more information, see Recompile a Stored Procedure. RECOMPILE is also useful when you create plan guides."

请注意以下句子:编译查询计划时,RECOMPILE 查询提示使用查询中任何局部变量的当前值,如果查询在存储过程中,则将当前值传递给任何参数.

Please note the following sentence : When compiling query plans, the RECOMPILE query hint uses the current values of any local variables in the query and, if the query is inside a stored procedure, the current values passed to any parameters.

这意味着 RECOMPILE 查询提示改变了 SSMS 执行计划生成行为.SSMS 根据典型参数值生成执行计划,而没有 RECOMPILE 查询提示,而 SSMS 根据当前参数值和 RECOMPILE 查询提示生成执行计划.所以重新编译提示使执行计划适合当前参数值(1234)

It means that RECOMPILE query hint change the SSMS execution plan generation behavior. SSMS generates execution plan based on typical parameter value witout RECOMPILE query hint while SSMS generates execution paln based on current parameter value with RECOMPILE query hint. So recompile hint make execution plan is perfect for current parameter value (1234)

一句话,执行计划是由复杂的因素决定的.我们必须仔细考虑.

In one word, execution plan is chosen by complex factors. We must consider it carefully.

推荐答案

首先,查询末尾的空格不会产生影响.不可思议,除了延长传输和解析语句的时间量之外,空格会导致任何数据库中的性能问题.SQL 引擎在编译阶段读取查询并生成执行计划.执行计划就是运行的东西.在对查询字符串进行标记的第一步中会丢失额外的空格.据我所知,所有数据库引擎都是这样工作的.

First, the white space at the end of the query would not make a difference. It is inconceivable that white space could cause a performance problem in any database, other than lengthening the amount of time for transmitting and parsing the statement. The SQL Engine reads the query during the compile phase and produces an execution plan. The execution plan is what gets run. Extra white space is lost in the very first step of tokenizing the query string. As far as I know, all database engines work this way.

在测试查询性能时,您需要处理导致性能变化的第一大原因:缓存.第二次运行查询时,它通常会更快,因为这些表已经在页面缓存中了.

When testing performance of queries, you need to deal with the number one cause of variability in performance: caching. The second time you run a query, it will usually go much faster, because the tables are already in the page cache.

一种方法是在两次运行之间清除缓存.另一种是多次运行查询,忽略第一次运行.

One way is to clear the cache between runs. Another is to run the query multiple times, and ignore the first run.

在任何情况下,您的第一个查询都不是正确的语法,因此这可能与您所看到的有关.选择语句是:

In any case, your first query is not proper syntax, so that might have something to do with what you are seeing. The select statement is:

select t.id1, t.sequence, t.id2, sum(t.vu) as totalVu,
       sum(t.backOffPctSum) / sum(t.recordNum) as avgBackOffPct

分组依据是:

group by t.testconfig_id, t.minuteSequence, t.location_id

变量 t.id1、t.sequence 和 t.id2 应该在 SQL Server 或任何合理的数据库中导致编译时错误,因为它们既不在聚合函数中也不在 group by 子句中(这是一个友好的挖掘 MySQL 中的隐藏列,这将允许使用这种语法).

The variables t.id1, t.sequence, and t.id2 should be causing a compile-time error in SQL Server or any reasonable database because they are neither in aggregation functions nor in the group by clause (this is a friendly dig at the hidden columns in MySQL, which would allow this syntax).

这篇关于Sql 的最后 4 个空格会损害 Sql Server 的性能吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆