测试Azure SQL DW并发显示内部Azure并未同时运行过程 [英] Testing Azure SQL DW Concurrency shows internally Azure is not running procedures concurrently

查看:477
本文介绍了测试Azure SQL DW并发显示内部Azure并未同时运行过程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已阅读SQL数据仓库中的Azure并发和工作负载管理
https://azure.microsoft.com/en-us/documentation/articles/sql-data-warehouse-develop-concurrency/
并了解限制放置在并发查询的数量基于规模可以使用,但我不能理解,因此提出这里的问题是当测试什么文档状态,我不能得到结果Azure声明是真实的。

I have read the Azure Concurrency and workload management in SQL Data Warehouse https://azure.microsoft.com/en-us/documentation/articles/sql-data-warehouse-develop-concurrency/ and understand the limits placed on the number of concurrent queries one can use based on scale, but what I cannot understand and thus raising the question here is when testing what the document states, I cannot get the results Azure claims are true. Running queries concurrently is still taking almost as long as running them serially.

下面是一个测试示例(只是一个测试)
我有5个存储过程当单独运行它们需要大约1秒钟来完成。所以当我连续运行所有5,他们大约需要5秒,这是预期,但是当我同时运行所有5个sprocs,我希望他们完成一个略多于1秒,但他们需要大约4.5-4.7秒完成。

For examples Here is a test example (just a test) I have 5 stored procedures that when run individually they take around 1 second each to complete. So when I run all 5 serially they take around 5 seconds, this is expected but when I run all 5 sprocs concurrently I would expect them to complete in a little over 1 second but instead they take about 4.5-4.7 seconds to complete.

有些Azure专家可以解释可能发生的情况吗?

Can some Azure expert explain what could be going on?

我认为这可能是资源争用, .dm_pdw_resource_waits显示当5个sprocs运行时没有阻塞。

I thought it could be resource contention but sys.dm_pdw_resource_waits shows no blockage while the 5 sprocs are running.

当我运行sys.dm_pdw_exec_requests时,我看到所有5个exec sproc查询提交了几个毫秒。对于Start_time和End_compile_time也是如此。所有5个sproc的end_time也在几毫秒内,但Total_elapsed_time接近5000毫秒,而不是预期的1000毫秒。如果我运行任何sproc本身,持续时间约为1000毫秒。它好像并发将同时启动所有5个sprocs,但在内部他们排队并按顺序运行。我最初测试一个DW200有8个插槽,这应该足够我的5个sprocs。为了安全我缩小到DW1000它允许我多达32个并发查询(我使用smallrc),但没有帮助这个问题。

When I run sys.dm_pdw_exec_requests I see all 5 exec sproc queries being submitted withing a few ms. Same is true for Start_time and End_compile_time. the end_time for all 5 sprocs is again within a couple ms but the Total_elapsed_time is closer to 5000 ms instead of the expected 1000 ms. If I run any sproc by itself the duration is around 1000 ms. Its as if Concurrency will start all 5 sprocs at the same time but internally they are queued up and run sequentially. I was originally testing on a DW200 which has 8 slots which should be enough for my 5 sprocs. To be safe I scaled out to a DW1000 which allows me up to 32 concurrent queries (I am using smallrc) but that did not help this issue.

这是我如何测试这个(使用DW1000)

Here is how I tested this (using DW1000)


  1. 我将1000条记录加载到5个单独的阶段表中
    (stage1,stage2等)

  1. I loaded 1000 records into 5 separate stage tables (stage1,stage2,etc..)

CREATE TABLE dbo.Stage1
(
     ShortId bigint NOT NULL
    ,TestName varchar(50) NOT NULL
    ,TestValue varchar(50) NOT NULL
    ,CreateDate DateTime NOT NULL 
)
WITH
(
    DISTRIBUTION = HASH (ShortId)
)


  • 我创建了5个事实表(fact1,fact2,etc ..)每个表有
    相同的4列作为stage,并使用散列在第一个
    列。我没有包括列存储索引(记住这只是
    a测试)

  • I created 5 fact tables (fact1,fact2, etc..) each table has the same 4 columns as stage and is distributed using hash on the first column. I did not include a columnstore index (remember this is only a test)

    CREATE TABLE dbo.Fact1
    (
         ShortId bigint NOT NULL
        ,TestName varchar(50) NOT NULL
        ,TestValue varchar(50) NOT NULL
        ,CreateDate DateTime NOT NULL 
    )
    WITH
    (
        DISTRIBUTION = HASH (ShortId)
    )
    


  • 我创建了5个存储过程,从
    阶段插入数据。

  • I created 5 stored procedures that inserts data into fact from stage.

    CREATE PROCEDURE dbo.TestLoad1
    AS
    BEGIN
        INSERT INTO dbo.Fact1   --this is dbo.Fact2 in sproc 2 etc...
        SELECT 
           stg.ShortId
          ,stg.PropertyName
          ,stg.PropertyValue 
          ,stg.AcquistionTime
        FROM dbo.Stage1 stg
            WHERE stg.ShortId NOT IN (SELECT ShortId from dbo.Fact1) --Fact2 etc..
    END
    


  • 在C#中创建了一个快速测试方法, 5连接,
    命令并使用BeginExecuteReader / EndExecuteReader执行
    sproc。 (这只是一个测试,所以原谅样式/代码)

  • In C# I created a quick test method that creates 5 connections, commands and uses BeginExecuteReader/EndExecuteReader to execute the sproc. (this is just a test, so forgive the style/code)

    SqlConnection cnn1 = new SqlConnection("Data Source=<server>;Initial Catalog=<database>;Persist Security Info = True;User ID =<username>;Password = <password>;Pooling = False;MultipleActiveResultSets = False;Connect Timeout = 30;Encrypt = True;TrustServerCertificate = False");
    SqlConnection cnn2 = new SqlConnection("Data Source=<server>;Initial Catalog=<database>;Persist Security Info = True;User ID =<username>;Password = <password>;Pooling = False;MultipleActiveResultSets = False;Connect Timeout = 30;Encrypt = True;TrustServerCertificate = False");
    SqlConnection cnn3 = new SqlConnection("Data Source=<server>;Initial Catalog=<database>;Persist Security Info = True;User ID =<username>;Password = <password>;Pooling = False;MultipleActiveResultSets = False;Connect Timeout = 30;Encrypt = True;TrustServerCertificate = False");
    SqlConnection cnn4 = new SqlConnection("Data Source=<server>;Initial Catalog=<database>;Persist Security Info = True;User ID =<username>;Password = <password>;Pooling = False;MultipleActiveResultSets = False;Connect Timeout = 30;Encrypt = True;TrustServerCertificate = False");
    SqlConnection cnn5 = new SqlConnection("Data Source=<server>;Initial Catalog=<database>;Persist Security Info = True;User ID =<username>;Password = <password>;Pooling = False;MultipleActiveResultSets = False;Connect Timeout = 30;Encrypt = True;TrustServerCertificate = False");
    
    SqlCommand cmd1;
    SqlCommand cmd2;
    SqlCommand cmd3;
    SqlCommand cmd4;
    SqlCommand cmd5;
    IAsyncResult result1;
    IAsyncResult result2;
    IAsyncResult result3;
    IAsyncResult result4;
    IAsyncResult result5;
    SqlDataReader reader1;
    SqlDataReader reader2;
    SqlDataReader reader3;
    SqlDataReader reader4;
    SqlDataReader reader5;
    
    cnn1.Open();
    cnn2.Open();
    cnn3.Open();
    cnn4.Open();
    cnn5.Open();
    
    cmd1 = new SqlCommand("dbo.TestLoad1", cnn1);
    cmd2 = new SqlCommand("dbo.TestLoad2", cnn2);
    cmd3 = new SqlCommand("dbo.TestLoad3", cnn3);
    cmd4 = new SqlCommand("dbo.TestLoad4", cnn4);
    cmd5 = new SqlCommand("dbo.TestLoad5", cnn5);
    
    cmd1.CommandType = CommandType.StoredProcedure;
    cmd2.CommandType = CommandType.StoredProcedure;
    cmd3.CommandType = CommandType.StoredProcedure;
    cmd4.CommandType = CommandType.StoredProcedure;
    cmd5.CommandType = CommandType.StoredProcedure;
    
    result1 = cmd1.BeginExecuteReader(CommandBehavior.SingleRow);
    result2 = cmd2.BeginExecuteReader(CommandBehavior.SingleRow);
    result3 = cmd3.BeginExecuteReader(CommandBehavior.SingleRow);
    result4 = cmd4.BeginExecuteReader(CommandBehavior.SingleRow);
    result5 = cmd5.BeginExecuteReader(CommandBehavior.SingleRow);
    
    reader1 = cmd1.EndExecuteReader(result1);  //this is where the code waits for 5 seconds
    reader2 = cmd2.EndExecuteReader(result2);
    reader3 = cmd3.EndExecuteReader(result3);
    reader4 = cmd4.EndExecuteReader(result4);
    reader5 = cmd5.EndExecuteReader(result5);
    
    reader1.Close();
    reader2.Close();
    reader3.Close();
    reader4.Close();
    reader5.Close();
    


  • 调试此C#代码时,每个语句都是< ; 1ms直到我得到线reader1 = cmd1.EndExecuteReader(result1);
    这里它将等待4-5秒,然后移动,每一行后再次快速(< 1ms)。

    When debugging this C# code each statement is < 1ms until I get to the line reader1 = cmd1.EndExecuteReader(result1); Here it will wait for 4-5 seconds then move on and every line after is again quick (<1ms).

    在此延迟期间如果我运行select *从sys.dm_pdw_exec_requests我看到所有5个请求排队和运行。如果我继续重新运行查询持续增加,然后突然(大约5秒)所有5个查询说他们完成。

    During that delay if I run select * from sys.dm_pdw_exec_requests I see all 5 requests are queued up and running. If I continue to re-run the query duration keeps increasing then all of sudden (around 5 seconds) all 5 queries say they completed.

    任何帮助将在解释我做错了或Azure SQL DW在内部做什么感谢。

    Any help would be appreciated in explaining what I am doing wrong or what Azure SQL DW is doing internally.

    谢谢

    推荐答案

    SQL Data Warehouse允许单一查询以利用单个查询来利用所有CPU或IO。当查询能够充分利用资源时,添加另一个同样竞争同一资源的查询将意味着两者都运行得更慢。也就是说,如果有一个查询使用100%的CPU并运行另一个也同时使用100%的查询,那么这两个查询将需要两倍的时间。这样做的好处是单个查询将尽可能快地运行,并且竞争不同资源的两个查询也将尽可能快地运行。当你运行一个像上面的测试,其中所有的查询本质上是相同的,预期串行或并发运行测试将需要相同的时间量。

    SQL Data Warehouse allows for a single query to utilize all CPU or IO with a single query. When query is able to fully utilize a resource, adding another query which also contends for the same resource will mean that both will run slower. That is, if you have a query which uses 100% of CPU and run another query which also uses 100% concurrently, both queries will take twice as long. The benefit of this is that a single query will run as fast as possible and two queries which contend for different resources will also run as fast as possible. When you run a test like the one above, where all queries are essentially identical, it would be expected that running the tests serially or concurrently will take the same amount of time.

    要进一步调查结果,您可能会发现监测文章有益。除了查看sys.dm_pdw_exec_requests,尝试看看sys.dm_pdw_request_steps(通过分布式SQL步骤的执行时间),我预计5秒钟的大部分时间以及sys.dm_pdw_sql_requests(分布的执行时间) )。

    To further investigate your results, you might find the monitoring article beneficial. In addition to looking at sys.dm_pdw_exec_requests, try taking a look at sys.dm_pdw_request_steps (execution time by Distributed SQL step), which I would expect to account for the majority of the 5 seconds as well as sys.dm_pdw_sql_requests (execution time by distribution).

    BTW,上面提到你没有为你的测试添加一个columnstore索引。在SQL DW中,默认表类型为clustered columnstore 。要删除此索引,请按如下方式更改WITH子句:

    BTW, above you mentioned that you didn't add a columnstore index for your test. In SQL DW the default table type is clustered columnstore. To remove this index, change your WITH clause as follows...


    WITH

    DISTRIBUTION = HASH ShortId),HEAP

    WITH ( DISTRIBUTION = HASH (ShortId), HEAP )

    希望这有帮助。

    这篇关于测试Azure SQL DW并发显示内部Azure并未同时运行过程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆