测试Azure SQL DW并发显示内部Azure并未同时运行过程 [英] Testing Azure SQL DW Concurrency shows internally Azure is not running procedures concurrently
问题描述
我已阅读SQL数据仓库中的Azure并发和工作负载管理
https://azure.microsoft.com/en-us/documentation/articles/sql-data-warehouse-develop-concurrency/
并了解限制放置在并发查询的数量基于规模可以使用,但我不能理解,因此提出这里的问题是当测试什么文档状态,我不能得到结果Azure声明是真实的。
I have read the Azure Concurrency and workload management in SQL Data Warehouse https://azure.microsoft.com/en-us/documentation/articles/sql-data-warehouse-develop-concurrency/ and understand the limits placed on the number of concurrent queries one can use based on scale, but what I cannot understand and thus raising the question here is when testing what the document states, I cannot get the results Azure claims are true. Running queries concurrently is still taking almost as long as running them serially.
下面是一个测试示例(只是一个测试)
我有5个存储过程当单独运行它们需要大约1秒钟来完成。所以当我连续运行所有5,他们大约需要5秒,这是预期,但是当我同时运行所有5个sprocs,我希望他们完成一个略多于1秒,但他们需要大约4.5-4.7秒完成。
For examples Here is a test example (just a test) I have 5 stored procedures that when run individually they take around 1 second each to complete. So when I run all 5 serially they take around 5 seconds, this is expected but when I run all 5 sprocs concurrently I would expect them to complete in a little over 1 second but instead they take about 4.5-4.7 seconds to complete.
有些Azure专家可以解释可能发生的情况吗?
Can some Azure expert explain what could be going on?
我认为这可能是资源争用, .dm_pdw_resource_waits显示当5个sprocs运行时没有阻塞。
I thought it could be resource contention but sys.dm_pdw_resource_waits shows no blockage while the 5 sprocs are running.
当我运行sys.dm_pdw_exec_requests时,我看到所有5个exec sproc查询提交了几个毫秒。对于Start_time和End_compile_time也是如此。所有5个sproc的end_time也在几毫秒内,但Total_elapsed_time接近5000毫秒,而不是预期的1000毫秒。如果我运行任何sproc本身,持续时间约为1000毫秒。它好像并发将同时启动所有5个sprocs,但在内部他们排队并按顺序运行。我最初测试一个DW200有8个插槽,这应该足够我的5个sprocs。为了安全我缩小到DW1000它允许我多达32个并发查询(我使用smallrc),但没有帮助这个问题。
When I run sys.dm_pdw_exec_requests I see all 5 exec sproc queries being submitted withing a few ms. Same is true for Start_time and End_compile_time. the end_time for all 5 sprocs is again within a couple ms but the Total_elapsed_time is closer to 5000 ms instead of the expected 1000 ms. If I run any sproc by itself the duration is around 1000 ms. Its as if Concurrency will start all 5 sprocs at the same time but internally they are queued up and run sequentially. I was originally testing on a DW200 which has 8 slots which should be enough for my 5 sprocs. To be safe I scaled out to a DW1000 which allows me up to 32 concurrent queries (I am using smallrc) but that did not help this issue.
这是我如何测试这个(使用DW1000)
Here is how I tested this (using DW1000)
-
我将1000条记录加载到5个单独的阶段表中
(stage1,stage2等)
I loaded 1000 records into 5 separate stage tables (stage1,stage2,etc..)
CREATE TABLE dbo.Stage1
(
ShortId bigint NOT NULL
,TestName varchar(50) NOT NULL
,TestValue varchar(50) NOT NULL
,CreateDate DateTime NOT NULL
)
WITH
(
DISTRIBUTION = HASH (ShortId)
)
我创建了5个事实表(fact1,fact2,etc ..)每个表有
相同的4列作为stage,并使用散列在第一个
列。我没有包括列存储索引(记住这只是
a测试)
I created 5 fact tables (fact1,fact2, etc..) each table has the same 4 columns as stage and is distributed using hash on the first column. I did not include a columnstore index (remember this is only a test)
CREATE TABLE dbo.Fact1
(
ShortId bigint NOT NULL
,TestName varchar(50) NOT NULL
,TestValue varchar(50) NOT NULL
,CreateDate DateTime NOT NULL
)
WITH
(
DISTRIBUTION = HASH (ShortId)
)
我创建了5个存储过程,从
阶段插入数据。
I created 5 stored procedures that inserts data into fact from stage.
CREATE PROCEDURE dbo.TestLoad1
AS
BEGIN
INSERT INTO dbo.Fact1 --this is dbo.Fact2 in sproc 2 etc...
SELECT
stg.ShortId
,stg.PropertyName
,stg.PropertyValue
,stg.AcquistionTime
FROM dbo.Stage1 stg
WHERE stg.ShortId NOT IN (SELECT ShortId from dbo.Fact1) --Fact2 etc..
END
在C#中创建了一个快速测试方法, 5连接,
命令并使用BeginExecuteReader / EndExecuteReader执行
sproc。 (这只是一个测试,所以原谅样式/代码)
In C# I created a quick test method that creates 5 connections, commands and uses BeginExecuteReader/EndExecuteReader to execute the sproc. (this is just a test, so forgive the style/code)
SqlConnection cnn1 = new SqlConnection("Data Source=<server>;Initial Catalog=<database>;Persist Security Info = True;User ID =<username>;Password = <password>;Pooling = False;MultipleActiveResultSets = False;Connect Timeout = 30;Encrypt = True;TrustServerCertificate = False");
SqlConnection cnn2 = new SqlConnection("Data Source=<server>;Initial Catalog=<database>;Persist Security Info = True;User ID =<username>;Password = <password>;Pooling = False;MultipleActiveResultSets = False;Connect Timeout = 30;Encrypt = True;TrustServerCertificate = False");
SqlConnection cnn3 = new SqlConnection("Data Source=<server>;Initial Catalog=<database>;Persist Security Info = True;User ID =<username>;Password = <password>;Pooling = False;MultipleActiveResultSets = False;Connect Timeout = 30;Encrypt = True;TrustServerCertificate = False");
SqlConnection cnn4 = new SqlConnection("Data Source=<server>;Initial Catalog=<database>;Persist Security Info = True;User ID =<username>;Password = <password>;Pooling = False;MultipleActiveResultSets = False;Connect Timeout = 30;Encrypt = True;TrustServerCertificate = False");
SqlConnection cnn5 = new SqlConnection("Data Source=<server>;Initial Catalog=<database>;Persist Security Info = True;User ID =<username>;Password = <password>;Pooling = False;MultipleActiveResultSets = False;Connect Timeout = 30;Encrypt = True;TrustServerCertificate = False");
SqlCommand cmd1;
SqlCommand cmd2;
SqlCommand cmd3;
SqlCommand cmd4;
SqlCommand cmd5;
IAsyncResult result1;
IAsyncResult result2;
IAsyncResult result3;
IAsyncResult result4;
IAsyncResult result5;
SqlDataReader reader1;
SqlDataReader reader2;
SqlDataReader reader3;
SqlDataReader reader4;
SqlDataReader reader5;
cnn1.Open();
cnn2.Open();
cnn3.Open();
cnn4.Open();
cnn5.Open();
cmd1 = new SqlCommand("dbo.TestLoad1", cnn1);
cmd2 = new SqlCommand("dbo.TestLoad2", cnn2);
cmd3 = new SqlCommand("dbo.TestLoad3", cnn3);
cmd4 = new SqlCommand("dbo.TestLoad4", cnn4);
cmd5 = new SqlCommand("dbo.TestLoad5", cnn5);
cmd1.CommandType = CommandType.StoredProcedure;
cmd2.CommandType = CommandType.StoredProcedure;
cmd3.CommandType = CommandType.StoredProcedure;
cmd4.CommandType = CommandType.StoredProcedure;
cmd5.CommandType = CommandType.StoredProcedure;
result1 = cmd1.BeginExecuteReader(CommandBehavior.SingleRow);
result2 = cmd2.BeginExecuteReader(CommandBehavior.SingleRow);
result3 = cmd3.BeginExecuteReader(CommandBehavior.SingleRow);
result4 = cmd4.BeginExecuteReader(CommandBehavior.SingleRow);
result5 = cmd5.BeginExecuteReader(CommandBehavior.SingleRow);
reader1 = cmd1.EndExecuteReader(result1); //this is where the code waits for 5 seconds
reader2 = cmd2.EndExecuteReader(result2);
reader3 = cmd3.EndExecuteReader(result3);
reader4 = cmd4.EndExecuteReader(result4);
reader5 = cmd5.EndExecuteReader(result5);
reader1.Close();
reader2.Close();
reader3.Close();
reader4.Close();
reader5.Close();
调试此C#代码时,每个语句都是< ; 1ms直到我得到线reader1 = cmd1.EndExecuteReader(result1);
这里它将等待4-5秒,然后移动,每一行后再次快速(< 1ms)。
When debugging this C# code each statement is < 1ms until I get to the line reader1 = cmd1.EndExecuteReader(result1); Here it will wait for 4-5 seconds then move on and every line after is again quick (<1ms).
在此延迟期间如果我运行select *从sys.dm_pdw_exec_requests我看到所有5个请求排队和运行。如果我继续重新运行查询持续增加,然后突然(大约5秒)所有5个查询说他们完成。
During that delay if I run select * from sys.dm_pdw_exec_requests I see all 5 requests are queued up and running. If I continue to re-run the query duration keeps increasing then all of sudden (around 5 seconds) all 5 queries say they completed.
任何帮助将在解释我做错了或Azure SQL DW在内部做什么感谢。
Any help would be appreciated in explaining what I am doing wrong or what Azure SQL DW is doing internally.
谢谢
推荐答案
SQL Data Warehouse允许单一查询以利用单个查询来利用所有CPU或IO。当查询能够充分利用资源时,添加另一个同样竞争同一资源的查询将意味着两者都运行得更慢。也就是说,如果有一个查询使用100%的CPU并运行另一个也同时使用100%的查询,那么这两个查询将需要两倍的时间。这样做的好处是单个查询将尽可能快地运行,并且竞争不同资源的两个查询也将尽可能快地运行。当你运行一个像上面的测试,其中所有的查询本质上是相同的,预期串行或并发运行测试将需要相同的时间量。
SQL Data Warehouse allows for a single query to utilize all CPU or IO with a single query. When query is able to fully utilize a resource, adding another query which also contends for the same resource will mean that both will run slower. That is, if you have a query which uses 100% of CPU and run another query which also uses 100% concurrently, both queries will take twice as long. The benefit of this is that a single query will run as fast as possible and two queries which contend for different resources will also run as fast as possible. When you run a test like the one above, where all queries are essentially identical, it would be expected that running the tests serially or concurrently will take the same amount of time.
要进一步调查结果,您可能会发现监测文章有益。除了查看sys.dm_pdw_exec_requests,尝试看看sys.dm_pdw_request_steps(通过分布式SQL步骤的执行时间),我预计5秒钟的大部分时间以及sys.dm_pdw_sql_requests(分布的执行时间) )。
To further investigate your results, you might find the monitoring article beneficial. In addition to looking at sys.dm_pdw_exec_requests, try taking a look at sys.dm_pdw_request_steps (execution time by Distributed SQL step), which I would expect to account for the majority of the 5 seconds as well as sys.dm_pdw_sql_requests (execution time by distribution).
BTW,上面提到你没有为你的测试添加一个columnstore索引。在SQL DW中,默认表类型为clustered columnstore 。要删除此索引,请按如下方式更改WITH子句:
BTW, above you mentioned that you didn't add a columnstore index for your test. In SQL DW the default table type is clustered columnstore. To remove this index, change your WITH clause as follows...
WITH
(
DISTRIBUTION = HASH ShortId),HEAP
)
WITH ( DISTRIBUTION = HASH (ShortId), HEAP )
希望这有帮助。
这篇关于测试Azure SQL DW并发显示内部Azure并未同时运行过程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!