使用SqlCommand Async方法处理大数据的性能糟糕 [英] Horrible performance using SqlCommand Async methods with large data

查看:74
本文介绍了使用SqlCommand Async方法处理大数据的性能糟糕的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用异步调用时,我遇到了主要的SQL性能问题.我创建了一个小案例来说明问题.

I'm having major SQL performance problems when using async calls. I have created a small case to demonstrate the problem.

我已经在SQL Server 2016上创建了一个数据库,该数据库位于我们的LAN中(而不是localDB).

I have create a database on a SQL Server 2016 which resides in our LAN (so not a localDB).

在该数据库中,我有一个具有两列的表WorkingCopy:

In that database, I have a table WorkingCopy with 2 columns:

Id (nvarchar(255, PK))
Value (nvarchar(max))

DDL

CREATE TABLE [dbo].[Workingcopy]
(
    [Id] [nvarchar](255) NOT NULL, 
    [Value] [nvarchar](max) NULL, 

    CONSTRAINT [PK_Workingcopy] 
        PRIMARY KEY CLUSTERED ([Id] ASC)
                    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, 
                          IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, 
                          ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

在该表中,我插入了一条记录(id ='PerfUnitTest',Value是1.5mb的字符串(较大的JSON数据集的zip)).

In that table, I have inserted a single record (id='PerfUnitTest', Value is a 1.5mb string (a zip of a larger JSON dataset)).

现在,如果我在SSMS中执行查询:

Now, if I execute the query in SSMS :

SELECT [Value] 
FROM [Workingcopy] 
WHERE id = 'perfunittest'

我立即获得结果,并且在SQL Servre Profiler中看到执行时间约为20毫秒.一切正常.

I immediately get the result, and I see in SQL Servre Profiler that the execution time was around 20 milliseconds. All normal.

使用简单的SqlConnection从.NET(4.6)代码执行查询时:

When executing the query from .NET (4.6) code using a plain SqlConnection :

// at this point, the connection is already open
var command = new SqlCommand($"SELECT Value FROM WorkingCopy WHERE Id = @Id", _connection);
command.Parameters.Add("@Id", SqlDbType.NVarChar, 255).Value = key;

string value = command.ExecuteScalar() as string;

此操作的执行时间也是20到30毫秒左右.

The execution time for this is also around 20-30 milliseconds.

但是将其更改为异步代码时:

But when changing it to async code :

string value = await command.ExecuteScalarAsync() as string;

执行时间突然 1800 ms !同样在SQL Server Profiler中,我看到查询执行时间超过一秒.尽管探查器报告的已执行查询与非异步版本完全相同.

The execution time is suddenly 1800 ms ! Also in SQL Server Profiler, I see that the query execution duration is more than a second. Although the executed query reported by the profiler is exactly the same as the non-Async version.

但是情况变得更糟.如果我在连接字符串中使用数据包大小,则会得到以下结果:

But it gets worse. If I play around with the Packet Size in the connection string, I get the following results :

数据包大小32768:[TIMING]:SqlValueStore中的ExecuteScalarAsync-> 经过时间:450毫秒

Packet size 32768 : [TIMING]: ExecuteScalarAsync in SqlValueStore -> elapsed time : 450 ms

数据包大小4096:[TIMING]:SqlValueStore中的ExecuteScalarAsync-> 耗用时间:3667毫秒

Packet Size 4096 : [TIMING]: ExecuteScalarAsync in SqlValueStore -> elapsed time : 3667 ms

数据包大小512:[TIMING]:SqlValueStore中的ExecuteScalarAsync-> 经过时间:30776毫秒

Packet size 512 : [TIMING]: ExecuteScalarAsync in SqlValueStore -> elapsed time : 30776 ms

30,000毫秒!这比非异步版本慢1000倍.并且SQL Server Profiler报告查询执行花费了10秒钟以上.甚至都无法解释其他20秒的去向!

30,000 ms!! That's over a 1000x slower than the non-async version. And SQL Server Profiler reports that the query execution took over 10 seconds. That doesn't even explain where the other 20 seconds are gone to!

然后,我又切换回同步版本,并且也使用了Packet Size,虽然它确实影响了一些执行时间,但与异步版本相比,它没有那么引人注目.

Then I've switched back to the sync version and also played around with the Packet Size, and although it did impact a little the execution time, it was nowhere as dramatic as with the async version.

作为一个旁注,如果仅在值中放入一个小的字符串(<100字节),则异步查询的执行速度与同步版本一样快(结果为1或2毫秒).

As a sidenote, if it put just a small string (< 100bytes) into the value, the async query execution is just as fast as the sync version (result in 1 or 2 ms).

我对此感到非常困惑,尤其是因为我使用的是内置的SqlConnection,甚至没有使用ORM.另外,在四处搜寻时,我什么也没找到能解释这种现象的东西.有什么想法吗?

I'm really baffled by this, especially since I'm using the built-in SqlConnection, not even an ORM. Also when searching around, I found nothing which could explain this behavior. Any ideas?

推荐答案

在负载不大的系统上,异步调用的开销会稍大.尽管I/O操作本身是异步的,但阻塞比线程池任务切换要快.

On a system without significant load, an async call has a slightly bigger overhead. While the I/O operation itself is asynchronous regardless, blocking can be faster than thread-pool task switching.

多少开销?让我们看看您的计时号码.阻塞呼叫为30毫秒,异步呼叫为450毫秒. 32 kiB数据包大小意味着您需要大约五十个单独的I/O操作.这意味着每个数据包大约有8ms的开销,这与您对不同数据包大小的测量结果非常吻合.即使异步版本需要比同步版本做更多的工作,这听起来也不是仅仅因为异步而产生的开销.听起来同步版本是(简化)1个请求-> 50个响应,而异步版本最终是1个请求-> 1个响应-> 1个请求-> 1个响应-> ...,这是一遍又一遍地付出的代价再次.

How much overhead? Let's look at your timing numbers. 30ms for a blocking call, 450ms for an asynchronous call. 32 kiB packet size means you need you need about fifty individual I/O operations. That means we have roughly 8ms of overhead on each packet, which corresponds pretty well with your measurements over different packet sizes. That doesn't sound like overhead just from being asynchronous, even though the asynchronous versions need to do a lot more work than the synchronous. It sounds like the synchronous version is (simplified) 1 request -> 50 responses, while the asynchronous version ends up being 1 request -> 1 response -> 1 request -> 1 response -> ..., paying the cost over and over again.

更深入. ExecuteReaderExecuteReaderAsync一样工作.下一个操作是Read,后跟GetFieldValue-有趣的事情在那里发生.如果两者之一异步,则整个操作很慢.因此,一旦开始使事情真正异步,肯定会有非常发生变化-Read会很快,然后异步GetFieldValueAsync会很慢,或者您可以从慢速的,然后GetFieldValueGetFieldValueAsync都很快.从流中进行的第一个异步读取很慢,并且该慢度完全取决于整个行的大小.如果我添加更多相同大小的行,则读取每一行所花费的时间就好像我只有一行一样,因此很明显,数据 仍在逐行流式传输-只是开始执行 any 异步读取后,似乎更喜欢一次读取整行.如果我异步读取第一行,然后异步读取第二行,那么正在读取的第二行将再次快速读取.

Going deeper. ExecuteReader works just as well as ExecuteReaderAsync. The next operation is Read followed by a GetFieldValue - and an interesting thing happens there. If either of the two is async, the whole operation is slow. So there's certainly something very different happening once you start making things truly asynchronous - a Read will be fast, and then the async GetFieldValueAsync will be slow, or you can start with the slow ReadAsync, and then both GetFieldValue and GetFieldValueAsync are fast. The first asynchronous read from the stream is slow, and the slowness depends entirely on the size of the whole row. If I add more rows of the same size, reading each row takes the same amount of time as if I only have one row, so it's obvious that the data is still being streamed row by row - it just seems to prefer to read the whole row at once once you start any asynchronous read. If I read the first row asynchronously, and the second synchronously - the second row being read will be fast again.

所以我们可以看到问题出在单个行和/或列的大小很大.总共有多少数据都没有关系-异步读取一百万个小行与同步一样快.但是,仅添加一个太大而无法容纳在单个数据包中的字段,就会神秘地异步读取该数据而产生成本-好像每个数据包都需要一个单独的请求数据包,并且服务器不能只在以下位置发送所有数据一次.使用CommandBehavior.SequentialAccess确实可以达到预期的性能,但是同步和异步之间仍然存在巨大差距.

So we can see that the problem is a big size of an individual row and/or column. It doesn't matter how much data you have in total - reading a million small rows asynchronously is just as fast as synchronously. But add just a single field that's too big to fit in a single packet, and you mysteriously incur a cost at asynchronously reading that data - as if each packet needed a separate request packet, and the server couldn't just send all the data at once. Using CommandBehavior.SequentialAccess does improve the performance as expected, but the massive gap between sync and async still exists.

我获得的最佳性能是正确完成整个过程.这意味着使用CommandBehavior.SequentialAccess,以及显式传输数据:

The best performance I got was when doing the whole thing properly. That means using CommandBehavior.SequentialAccess, as well as streaming the data explicitly:

using (var reader = await cmd.ExecuteReaderAsync(CommandBehavior.SequentialAccess))
{
  while (await reader.ReadAsync())
  {
    var data = await reader.GetTextReader(0).ReadToEndAsync();
  }
}

因此,同步和异步之间的差异变得难以衡量,并且更改数据包大小不再像以前那样引起可笑的开销.

With this, the difference between sync and async becomes hard to measure, and changing the packet size no longer incurs the ridiculous overhead as before.

如果要在极端情况下保持良好性能,请确保使用可用的最佳工具-在这种情况下,请流式处理大型列数据,而不要依赖诸如ExecuteScalarGetFieldValue的助手.

If you want good performance in edge cases, make sure to use the best tools available - in this case, stream large column data rather than relying on helpers like ExecuteScalar or GetFieldValue.

这篇关于使用SqlCommand Async方法处理大数据的性能糟糕的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆