为什么第一次启动服务后这个查询很慢? [英] Why is this query slow the first time after I start the service?

查看:35
本文介绍了为什么第一次启动服务后这个查询很慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好的.这是我尝试运行的内容:

使用 tempdb;SELECT TOP 1000000 IDENTITY(INT, 1, 1) 数字INTO号码从 sys.objects s1交叉连接 sys.objects s2交叉连接 sys.objects s3交叉连接 sys.objects s4;

这是让我成为数字表"查询之一.

问题来了.如果我在 SQL Server 服务(重新)启动后立即运行它,它会花费很长时间.不会像十秒钟那样永远,我希望它更快.永远一样,我不小心让它超过了两个小时,但仍然不得不杀死它.我认为它永远不会永远回来.通常在我的机器上运行它需要不到两秒钟的时间.

但是,如果我这样做:

使用 tempdb;SELECT TOP 1000000 IDENTITY(INT, 1, 1) 数字INTO号码从 sys.objects s1交叉连接 sys.objects s2交叉连接 sys.objects s3;删除表号;SELECT TOP 1000000 IDENTITY(INT, 1, 1) 数字INTO号码从 sys.objects s1交叉连接 sys.objects s2交叉连接 sys.objects s3交叉连接 sys.objects s4;

然后它按您的预期工作——第一个 SELECT 运行不到两秒钟,第二个也是如此.为什么我不直接用三桌版呢?因为 sys.objects 中没有足够的条目来满足乘以等于一百万结果行的立方数.但这已经不是重点了.

无论如何,从这里开始,我可以随心所欲地重复第二个DROP/SELECT...INTO,没问题.不知何故,第一个三桌版本使它永远正常.至少,直到下次服务重新启动和/或机器重新启动.在这一点上,再次运行最后一个 SELECT 永远不会回来.再次.

这里开始变得更奇怪了.如果我将第一个 SELECT 削回两个表的版本:

使用 tempdb;SELECT TOP 1000000 IDENTITY(INT, 1, 1) 数字INTO号码从 sys.objects s1交叉连接 sys.objects s2;删除表号;SELECT TOP 1000000 IDENTITY(INT, 1, 1) 数字INTO号码从 sys.objects s1交叉连接 sys.objects s2交叉连接 sys.objects s3交叉连接 sys.objects s4;

这也使第二个 SELECT 永远运行. 单表版本也是如此.不知何故,那个三桌版本很神奇!

这里发生了什么?为什么这么慢?

(在有人指出我正在 tempdb 中创建永久表之前,是的,我知道.更改为实际临时表没有任何区别.)

<小时>

添加信息:

  • 这是 SQL Server 2012 开发人员版
  • EXEC sp_WhoIsActive @find_block_leaders = 1, @sort_order = '[blocked_session_count] DESC' 的输出(脚本为 XML,因此可以在此处读取)是:
<前><?xml version="1.0" ?><结果1><记录><dd hh:mm:ss.mss>00 00:10:45.066</dd hh:mm:ss.mss>52<sql_text><?query --SELECT TOP 1000000 IDENTITY(INT, 1, 1) 数字INTO号码从 sys.objects s1交叉连接 sys.objects s2交叉连接 sys.objects s3交叉连接 sys.objects s4;--?></sql_text><login_name>我自己的登录名已被编辑</login_name>(99ms)LCK_M_X 9,750<tempdb_allocations>713</tempdb_allocations> 702<blocking_session_id>NULL</blocking_session_id> 0<阅读> 583,273<写> 537<physical_reads> 50</physical_reads><used_memory> 3</used_memory><状态>暂停 2<percent_complete>NULL</percent_complete><host_name>我自己的机器名称已编辑</host_name><database_name>tempdb</database_name><program_name>Microsoft SQL Server Management Studio - 查询</program_name><start_time>2013-11-23 23:48:19.473</start_time><login_time>2013-11-23 23:47:47.060</login_time>0<collection_time>2013-11-23 23:59:04.560</collection_time>

<小时>

更多附加信息:

为什么我把它放在 tempdb 中是因为它是打算在原始安装上运行的脚本的一部分,并且 tempdb 保证在那里.正如我所说,更改为全局临时表并没有什么不同.

解决方案

can 也可以在我的机器上 100% 的时间重现这个问题.(见文末注释)

问题的要点在于,您在 tempdb 中的系统表行上取消了 S 锁,这些锁可能与内部 tempdb<所需的锁发生冲突/code> 清理事务.

当此清理工作分配给拥有 S 锁的同一个会话时,可能会发生无限期挂起.

为了避免这个问题,你需要停止引用 tempdb 中的 system 对象.

可以在不引用任何外部表的情况下创建数字表.以下不需要读取基表行,因此也不需要锁.

有十(N)个AS(SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALLSELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALLSELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1)SELECT TOP 1000000 IDENTITY(INT, 1, 1) 数字INTO号码从十个T10,十个T100,十个T1000,十个T10000,十个T100000,十 T1000000

重现步骤

先创建一个程序

CREATE PROC P作为设置无计数;声明@T 表(X INT)走

然后重新启动SQL服务并在一个连接中执行

虽然不存在(选择 *从 sys.dm_os_waiting_tasksWHERE session_id = blocks_session_id)开始/*这会导致有问题的droptemp事务*/EXEC sp_recompile 'P'执行程序结尾;选择 *从 sys.dm_os_waiting_tasksWHERE session_id = blocks_session_id

然后在另一个连接中运行

使用 tempdb;SELECT TOP 1000000 IDENTITY(INT, 1, 1) 数字进入#T从 sys.objects s1交叉连接 sys.objects s2交叉连接 sys.objects s3交叉连接 sys.objects s4;删除表#T

填充 Numbers 表的查询似乎设法通过清理临时对象(例如表变量)的内部系统事务进入实时锁定状态.

我设法以这种方式阻止了会话 ID 53.它被无限期阻止.sp_WhoIsActive 的输出表明该 spid 几乎所有时间都处于挂起状态.在连续运行中,reads 列中的数字增加,但其他列中的值基本保持不变.

等待持续时间没有显示出增加的模式,尽管表明它必须定期解除阻塞才能再次阻塞.

SELECT *从 sys.dm_os_waiting_tasksWHERE session_id = blocks_session_id

退货

+----------------------+------------+-----------------+-----------+-----------+---------------------+-----------------------+--------------------+----------------------------------------+--------------------------------------------------------------------------------------------------+|等待任务地址|session_id |exec_context_id |wait_duration_ms |等待类型 |资源地址|阻塞任务地址|blocks_session_id |阻塞_exec_context_id |资源描述|+--------------+------------+----------------+------------+-----------+--------------------+------------------------------+-----------------------+---------------------------+-------------------------------------------------------------------------------------------+|0x00000002F2C170C8 |53 |0 |86 |LCK_M_X |0x00000002F9B13040 |0x00000002F2C170C8 |53 |空 |keylock hobtid=281474978938880 dbid=2 id=lock2f9ac8880 mode=U associatedObjectId=281474978938880 |+--------------+------------+----------------+------------+-----------+--------------------+------------------------------+-----------------------+---------------------------+-------------------------------------------------------------------------------------------+

在资源描述中使用id

SELECT o.nameFROM sys.allocation_units au WITH (NOLOCK)INNER JOIN sys.partitions p WITH (NOLOCK)ON au.container_id = p.partition_idINNER JOIN sys.all_objects o WITH (NOLOCK)ON o.object_id = p.object_idWHERE allocation_unit_id = 281474978938880

退货

+------------+|姓名 |+------------+|sysschobjs |+------------+

运行

SELECT resource_description,request_status从 sys.dm_tran_locksWHERE request_session_id = 53 AND request_status <>'授予'

退货

+----------------------+----------------+|资源描述|request_status |+----------------------+----------------+|(246708db8c1f) |转换 |+----------------------+----------------+

通过 DAC 连接并运行

SELECT id,name从 tempdb.sys.sysschobjs WITH (NOLOCK)WHERE %%LOCKRES%% = '(246708db8c1f)'

退货

+-------------+-----------+|身份证 |姓名 |+------------+------------+|-1578606288 |#A1E86130 |+------------+------------+

好奇这是什么

SELECT name,user_type_id从 tempdb.sys.columnsWHERE object_id = -1578606288

退货

+------+--------------+|姓名 |user_type_id |+------+--------------+|X |56 |+------+--------------+

这是存储过程使用的表变量中的列名.

运行

SELECT request_mode,请求状态,request_session_id,request_owner_id,锁所有者地址,t.transaction_id,t.name,t.transaction_begin_timeFROM sys.dm_tran_locks l加入 sys.dm_tran_active_transactions tON l.request_owner_id = t.transaction_idWHERE resource_description = '(246708db8c1f)'

退货

+--------------+----------------+--------------------+------------------+--------------------+----------------+-------------+--------------------------+|request_mode |request_status |request_session_id |request_owner_id |lock_owner_address |交易ID |姓名 |transaction_begin_time |+--------------+----------------+-------------------+--------------------+--------------------+----------------+--------------+------------------------+|你|授予 |53 |227647 |0x00000002F1EF6800 |227647 |droptemp |2013-11-24 18:36:28.267 ||| |授予 |53 |191790 |0x00000002F9B16380 |191790 |选择进入 |2013-11-24 18:21:30.083 ||X |转换 |53 |227647 |0x00000002F9B12FC0 |227647 |droptemp |2013-11-24 18:36:28.267 |+--------------+----------------+-------------------+--------------------+--------------------+----------------+--------------+------------------------+

因此 SELECT INTO 事务在 tempdb.sys.sysschobjs 中与表变量 S 锁>#A1E86130.由于 S 锁冲突,droptemp 事务无法获得该行的 X 锁.

重复运行此查询会发现 droptemp 事务的 transaction_id 重复更改.

我推测 SQL Server 必须在用户 spid 上分配这些内部事务,并在执行用户工作之前对它们进行优先级排序.所以会话 id 53 被困在一个恒定的循环中,它启动一个 droptemp 事务,被在同一个 spid 上运行的用户事务阻塞.回滚内部事务,然后无限期地重复该过程.

这是通过在 spid 挂起后跟踪 SQL Server Profiler 中的各种锁定和事务事件而得到证实的.

在此之前,我还跟踪了锁定事件.

锁定事件阻止

SELECT INTO 事务在 sysschobjs 中的键上取出的大多数共享键锁会立即释放.例外是 (246708db8c1f) 上的第一个锁.

这是有道理的,因为该计划显示了对 [sys].[sysschobjs].[clst] [o] 的嵌套循环扫描,并且由于临时对象获得了负 objectid,因此它们将成为第一行按扫描顺序遇到.

我也遇到过 OP 中描述的情况,首先运行三路交叉连接似乎允许四路交叉连接成功.

SELECT INTO 事务跟踪中的前几个事件有一个完全不同的模式.

这是在服务重启之后,所以文本数据列中的锁资源值不能直接比较.

不是保留第一个键上的锁,然后获取和释放后续键的模式,它似乎在最初不释放它们的情况下获取更多锁.

我认为在避免这个问题的执行策略上一定存在一些差异.

<小时>

更新

我就此提出的连接项 尚未标记为已修复,但我现在使用的是 SQL Server 2012 SP2,现在只能重现临时自阻塞而不是永久阻塞.我仍然得到自我阻塞,但在成功执行 droptemp 事务的尝试失败后,它似乎又回到处理用户事务了.之后提交系统事务,然后成功执行.仍然在同一个 spid 上.(在一个示例中运行了 8 次尝试.我不确定这是否会一直重复)

Ok. Here's what I try to run:

USE tempdb;

SELECT TOP 1000000 IDENTITY(INT, 1, 1) Number
INTO Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
CROSS JOIN sys.objects s3
CROSS JOIN sys.objects s4;

This is one of those "make me a numbers table" queries.

Here's the problem. If I run this immediately after the SQL Server service is (re)started, it takes forever. Not forever as in ten seconds and I want it faster.  Forever as in, I let it go over two hours once by accident and still had to kill it. I'm thinking it just never ever comes back. And ordinarily it takes under two seconds on my machine to run this.

However, if I do this instead:

USE tempdb;

SELECT TOP 1000000 IDENTITY(INT, 1, 1) Number
INTO Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
CROSS JOIN sys.objects s3;

DROP TABLE Numbers;

SELECT TOP 1000000 IDENTITY(INT, 1, 1) Number
INTO Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
CROSS JOIN sys.objects s3
CROSS JOIN sys.objects s4;

Then it works as you would expect — the first SELECT runs in under two seconds, as does the second. Why don't I just use the three-table version? Because there aren't enough entries in sys.objects for that number cubed to equal a million result rows. But that's not even the point anymore.

Anyway, from here on, I can repeat that second DROP / SELECT…INTO as much as I want, no problem. Somehow that first three-table version made it ok forever. At least, till the next time the service is restarted and/or the machine rebooted. At which point, running that last SELECT again never comes back. Again.

Here's where it starts getting even weirder. If I pare that first SELECT back to a two-table version:

USE tempdb;

SELECT TOP 1000000 IDENTITY(INT, 1, 1) Number
INTO Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2;

DROP TABLE Numbers;

SELECT TOP 1000000 IDENTITY(INT, 1, 1) Number
INTO Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
CROSS JOIN sys.objects s3
CROSS JOIN sys.objects s4;

This also makes the second SELECT run forever. As does a one-table version.  Somehow, that three-table version is magical!

What is going on here?  Why is this slow?

(And before anyone points out that I'm creating a permanent table in tempdb, yes, I know. Changing to actual temp tables doesn't make any difference.)


Added info:

  • This is SQL Server 2012 Developer Edition
  • Output of EXEC sp_WhoIsActive @find_block_leaders = 1, @sort_order = '[blocked_session_count] DESC' (scripted as XML so it can be read here) is:

<?xml version="1.0" ?>
<RESULTS1>
    <RECORD>
        <dd hh:mm:ss.mss>00 00:10:45.066</dd hh:mm:ss.mss>
        <session_id>52</session_id>
        <sql_text>&lt;?query --
SELECT TOP 1000000 IDENTITY(INT, 1, 1) Number
INTO Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
CROSS JOIN sys.objects s3
CROSS JOIN sys.objects s4;

--?&gt;</sql_text>
        <login_name>my own login name redacted</login_name>
        <wait_info>(99ms)LCK_M_X</wait_info>
        <CPU>              9,750</CPU>
        <tempdb_allocations>                713</tempdb_allocations>
        <tempdb_current>                702</tempdb_current>
        <blocking_session_id>NULL</blocking_session_id>
        <blocked_session_count>                  0</blocked_session_count>
        <reads>            583,273</reads>
        <writes>                537</writes>
        <physical_reads>                 50</physical_reads>
        <used_memory>                  3</used_memory>
        <status>suspended</status>
        <open_tran_count>                  2</open_tran_count>
        <percent_complete>NULL</percent_complete>
        <host_name>my own machine name redacted</host_name>
        <database_name>tempdb</database_name>
        <program_name>Microsoft SQL Server Management Studio - Query</program_name>
        <start_time>2013-11-23 23:48:19.473</start_time>
        <login_time>2013-11-23 23:47:47.060</login_time>
        <request_id>0</request_id>
        <collection_time>2013-11-23 23:59:04.560</collection_time>
    </RECORD>
</RESULTS1>


More added info:

Why I'm putting this in tempdb is that it's part of a script intended to be run on virgin installations, and tempdb is guaranteed to be there. As I said, changing to global temp tables does no different.

解决方案

I can could also reproduce this 100% of the time on my machine. (see note at end)

The gist of the problem is that you are taking out S locks on system table rows in tempdb that can conflict with the locks needed for internal tempdb cleanup transactions.

When this clean up work is allocated to the same session that owns the S lock an indefinite hang can occur.

To avoid this problem for certain you need to stop referencing the system objects inside tempdb.

It is possible to create a numbers table without referencing any external tables at all. The following needs to read no base table rows and thus also takes no locks.

WITH Ten(N) AS 
(
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)   
SELECT TOP 1000000 IDENTITY(INT, 1, 1) Number
INTO   Numbers
FROM   Ten T10,
       Ten T100,
       Ten T1000,
       Ten T10000,
       Ten T100000,
       Ten T1000000 

Steps to Reproduce

First create a procedure

CREATE PROC P
AS
    SET NOCOUNT ON;

    DECLARE @T TABLE (X INT)
GO

Then restart the SQL Service and in one connection execute

WHILE NOT EXISTS(SELECT *
                 FROM   sys.dm_os_waiting_tasks
                 WHERE  session_id = blocking_session_id)
  BEGIN

      /*This will cause the problematic droptemp transactions*/
      EXEC sp_recompile 'P'

      EXEC P
  END;

SELECT *
FROM   sys.dm_os_waiting_tasks
WHERE  session_id = blocking_session_id 

Then in another connection run

USE tempdb;

SELECT TOP 1000000 IDENTITY(INT, 1, 1) Number
INTO #T
FROM sys.objects s1
CROSS JOIN sys.objects s2
CROSS JOIN sys.objects s3
CROSS JOIN sys.objects s4;

DROP TABLE #T

The query populating the Numbers table seems to manage to get into a live lock situation with the internal system transactions that clean up temporary objects such as table variables.

I managed to get session id 53 blocked in this manner. It is blocked indefinitely. The output of sp_WhoIsActive shows that this spid spends almost all of the time suspended. In consecutive runs the numbers in the reads column increases but the values in the other columns remain largely the same.

The wait duration doesn't show an increasing pattern though indicating that it must get unblocked periodically before getting blocked again.

SELECT *
FROM   sys.dm_os_waiting_tasks
WHERE  session_id = blocking_session_id

Returns

+----------------------+------------+-----------------+------------------+-----------+--------------------+-----------------------+---------------------+--------------------------+--------------------------------------------------------------------------------------------------+
| waiting_task_address | session_id | exec_context_id | wait_duration_ms | wait_type |  resource_address  | blocking_task_address | blocking_session_id | blocking_exec_context_id |                                       resource_description                                       |
+----------------------+------------+-----------------+------------------+-----------+--------------------+-----------------------+---------------------+--------------------------+--------------------------------------------------------------------------------------------------+
| 0x00000002F2C170C8   |         53 |               0 |               86 | LCK_M_X   | 0x00000002F9B13040 | 0x00000002F2C170C8    |                  53 | NULL                     | keylock hobtid=281474978938880 dbid=2 id=lock2f9ac8880 mode=U associatedObjectId=281474978938880 |
+----------------------+------------+-----------------+------------------+-----------+--------------------+-----------------------+---------------------+--------------------------+--------------------------------------------------------------------------------------------------+

Using the id in the resource description

SELECT o.name
FROM   sys.allocation_units au WITH (NOLOCK)
       INNER JOIN sys.partitions p WITH (NOLOCK)
         ON au.container_id = p.partition_id
       INNER JOIN sys.all_objects o WITH (NOLOCK)
         ON o.object_id = p.object_id
WHERE  allocation_unit_id = 281474978938880 

Returns

+------------+
|    name    |
+------------+
| sysschobjs |
+------------+

Running

SELECT resource_description,request_status
FROM   sys.dm_tran_locks 
WHERE request_session_id = 53 AND request_status <> 'GRANT'

Returns

+----------------------+----------------+
| resource_description | request_status |
+----------------------+----------------+
| (246708db8c1f)       | CONVERT        |
+----------------------+----------------+

Connecting via the DAC and running

SELECT id,name
FROM   tempdb.sys.sysschobjs WITH (NOLOCK)
WHERE %%LOCKRES%% = '(246708db8c1f)' 

Returns

+-------------+-----------+
|     id      |   name    |
+-------------+-----------+
| -1578606288 | #A1E86130 |
+-------------+-----------+

Curious about what that is

SELECT name,user_type_id
FROM tempdb.sys.columns
WHERE object_id = -1578606288 

Returns

+------+--------------+
| name | user_type_id |
+------+--------------+
| X    |           56 |
+------+--------------+

This is the column name in the table variable used by the stored proc.

Running

SELECT request_mode,
       request_status,
       request_session_id,
       request_owner_id,
       lock_owner_address,
       t.transaction_id,
       t.name,
       t.transaction_begin_time
FROM   sys.dm_tran_locks l
       JOIN sys.dm_tran_active_transactions t
         ON l.request_owner_id = t.transaction_id
WHERE  resource_description = '(246708db8c1f)' 

Returns

+--------------+----------------+--------------------+------------------+--------------------+----------------+-------------+-------------------------+
| request_mode | request_status | request_session_id | request_owner_id | lock_owner_address | transaction_id |    name     | transaction_begin_time  |
+--------------+----------------+--------------------+------------------+--------------------+----------------+-------------+-------------------------+
| U            | GRANT          |                 53 |           227647 | 0x00000002F1EF6800 |         227647 | droptemp    | 2013-11-24 18:36:28.267 |
| S            | GRANT          |                 53 |           191790 | 0x00000002F9B16380 |         191790 | SELECT INTO | 2013-11-24 18:21:30.083 |
| X            | CONVERT        |                 53 |           227647 | 0x00000002F9B12FC0 |         227647 | droptemp    | 2013-11-24 18:36:28.267 |
+--------------+----------------+--------------------+------------------+--------------------+----------------+-------------+-------------------------+

So the SELECT INTO transaction is holding an S lock on the row in tempdb.sys.sysschobjs pertaining to the table variable #A1E86130. The droptemp transaction cannot get an X lock on this row because of this conflicting S lock.

Running this query repeatedly reveals that the transaction_id for the droptemp transaction repeatedly changes.

I speculate that SQL Server must allocate these internal transactions on user spids and prioritise them before doing the user work. So the session id 53 is stuck in a constant cycle where it starts up a droptemp transaction, is blocked by the user transaction running on the same spid. Rolls back the internal transaction then repeats the process indefinitely.

This is borne out by tracing the various locking and transaction events in SQL Server Profiler after the spid becomes hung.

I also traced the locking events prior to that.

Lock Events Blocking

Most of the shared key locks taken out by the SELECT INTO transaction on keys in sysschobjs get released immediately. The exception is the first lock on (246708db8c1f).

This makes some sense as the plan shows nested loops scans of [sys].[sysschobjs].[clst] [o] and because temporary objects get given negative objectids they will be the first rows encountered in scan order.

I also encountered the situation described in the OP where running a three way cross join first seems to allow the four way one to succeed.

The first few events in the trace for the SELECT INTO transaction there are an entirely different pattern.

This was after a service restart so the lock resource values in the text data column are not directly comparable.

Instead of retaining the lock on the first key and then a pattern of acquiring and releasing subsequent keys it seems to acquire a lot more locks without releasing them initially.

I presume there must be some variance in execution strategy that avoids the issue.


Update

The Connect Item I raised about this has not been marked as fixed but I am now on SQL Server 2012 SP2 and can now only reproduce temporary self blocking rather than permanent. I still get the self blocking but after some amount of failed attempts to execute the droptemp transaction successfully it appears to go back to processing the user transaction. After that commits the system transaction then is executed successfully. Still on the same spid. (eight attempts in one example run. I'm not sure if this will be consistently repeated)

这篇关于为什么第一次启动服务后这个查询很慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆