透视脚本的性能改进 [英] performance improvement on Pivoting script

查看:75
本文介绍了透视脚本的性能改进的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的数据生成过程中,我有一个约有1亿行的临时表. 表的结构如下:

In my data generation procedure I have a temporary table with about 100M rows. Table's structure is the following:

CREATE TABLE #table ( ProductId INT, CountryCodeID INT ,DataTypeID INT,
     Formula VARCHAR(1000) ,Yr INT, Letter VARCHAR(100) , Data FLOAT(53))

我们可以在其中填充一些虚拟数据(〜10M):

We can populate it with some dummy data (~10M):

INSERT INTO #table ( ProductId, CountryCodeID, DataTypeID, Formula, Yr, Letter, Data )
SELECT
     P.ProductID, C.CountryCodeID, D.DataTypeID, F.Formula, Y.Yr,
     L.Letter, RAND() AS Data
FROM (VALUES (
    1856),(1459),(1816),(238),(328),(444),(921),(1724),(155),(420),(795),
    (620),(1007),(153),(1659),(95),(952),(1476),(759),(1461),(1958),(1341),
    (116)) P(productID)
CROSS JOIN (VALUES (16),(302),(422),(36),(95),(744),(4),(285),(1849),(1402),
    (430),(835),(214),(1476),(711),(36),(142),(428),(768),(78),(510),(945),
    (1125)) C(CountryCodeID)
CROSS JOIN (VALUES (1120),(1121),(1122),(1123),(1124),(1125),(1126),(1127),
    (1128)) D(DataTypeID)
CROSS JOIN (VALUES ('A+B'),('A/B/(A+B+C+D+E+G)'),('A/B/(A+B+C+D+E)'),
    ('A/B/(A+B+C+D)'),('A/B/(A+B+C)'),('A/B/(A+B)'),
    ('A/B/(A+B+C+D+E+G+Z)')) F(Formula)
CROSS JOIN (VALUES (1977),(1978),(1979),(1980),(1981),(1982),(1983),(1984),
    (1985),(1986),(1987),(1988), (2000),(2001),(2002),(2003),        (2004),
    (2005),(2006),(2007),(2008),(2009),(2010),(2011),(2012),(2013)) Y(Yr)
CROSS JOIN (VALUES ('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('J '),
    ('K'),('L'),('M')) L(Letter) 
--10 398 024

在过程结束时,我需要将表旋转到另一个临时表:

In the end of my procedure I need to Pivot my table to another temporary table:

SELECT
     ProductID,CountryCodeId,DataTypeID,Formula,
    Yr,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,X,Z,W
INTO #Final3
FROM #table
PIVOT ( MAX(Data) FOR Letter IN (
     A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,X,Z,W
    ) ) AS pvt;

最后一个查询非常慢.从字面上看需要很长时间.

And the last query is extremely slow. It takes literally ages.

执行计划:

开头有表格扫描,但只有2% 将表格插入#Final占83%

there is table scan in the beginning but only 2% table insert into #Final takes 83%

有趣的是,估计的行数nr比实际行数大得多.

Interesting that estimated nr of rows much bigger than actual number of rows.

所以我的问题是如何提高性能.我的一些想法:

So my question is how can I improve performance. Some of my ideas:

  • 索引?;
  • 也许可以使用NTILE#table分成10个较小的表,然后创建一个循环,执行10个枢轴?
  • Indexes?;
  • Maybe it can help to divide #table into 10 smaller tables using NTILE and after that create a loop which will perform 10 pivots?

说实话,我没有其他想法...如果您有相关经验,请提出建议. 谢谢.

To be honest I don't have any other ideas ... If you have relevant experience - please advice. Thanks.

推荐答案

我尝试使用老式的PIVOTing(使用CASE),并在系统上获得了更好的结果(速度提高了2到3倍).还创建了一个CLUSTERED INDEX,因此插入数据的速度会慢很多(对于此测试,SQL Server抱怨密钥长度).尝试一下,看看它是否可以改善您的业务案例.

I've tried using the old-school PIVOTing (using CASE) and got better results on my system (2~3 times faster). Also created a CLUSTERED INDEX, so inserting data will be a lot slower (for this test, SQL Server complains about the key length). Try it out and see if it improves your business case.

希望这会有所帮助.如果没有,请使用我的有关使用报告矩阵 OLAP多维数据集.另外,请检查以下链接:枢轴转换.如果您不能使用它们,甚至可能自己用编程语言编写PIVOT也可能会更好.

Hope this helps. If it doesn't please use my suggestions about using Reporting Matrices or OLAP Cubes. Also check this link: Pivot Transformation. If you can't use those, maybe even coding the PIVOT yourself in your programming language might perform better.

测试数据创建(使用了全局温度,因此我可以在SSMS的第二个选项卡中测试选择):

Test data creation (used global temp so I could test selection in second tab in SSMS):

CREATE TABLE ##table ( ProductId INT, CountryCodeID INT ,DataTypeID INT, Formula VARCHAR(1000) ,Yr INT, Letter VARCHAR(100) , Data FLOAT(53));
CREATE CLUSTERED INDEX tt ON ##table(ProductId, CountryCodeID, DataTypeID,Formula,Yr);

INSERT INTO ##table ( ProductId, CountryCodeID, DataTypeID, Formula, Yr, Letter, Data )
SELECT P.ProductID, C.CountryCodeID, D.DataTypeID, F.Formula, Y.Yr, L.Letter, RAND() AS Data
FROM (VALUES (1856),(1459),(1816),(238),(328),(444),(921),(1724),(155),(420),(795),(620),(1007),(153),(1659),(95),(952),(1476),(759),(1461),(1958),(1341),(116)) P(productID)
CROSS JOIN (VALUES (16),(302),(422),(36),(95),(744),(4),(285),(1849),(1402),(430),(835),(214),(1476),(711),(36),(142),(428),(768),(78),(510),(945),(1125)) C(CountryCodeID)
CROSS JOIN (VALUES (1120),(1121),(1122),(1123),(1124),(1125),(1126),(1127),(1128)) D(DataTypeID)
CROSS JOIN (VALUES ('A+B'),('A/B/(A+B+C+D+E+G)'),('A/B/(A+B+C+D+E)'),('A/B/(A+B+C+D)'),('A/B/(A+B+C)'),('A/B/(A+B)'),('A/B/(A+B+C+D+E+G+Z)')) F(Formula)
CROSS JOIN (VALUES (1977),(1978),(1979),(1980),(1981),(1982),(1983),(1984),(1985),(1986),(1987),(1988), (2000),(2001),(2002),(2003),(2004),(2005),(2006),(2007),(2008),(2009),(2010),(2011),(2012),(2013)) Y(Yr)
CROSS JOIN (VALUES ('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('J '),('K'),('L'),('M')) L(Letter) ;

#final3中生成数据:

SELECT 
    ProductID,
    CountryCodeId,
    DataTypeID,
    Formula,
    Yr,
    MAX(CASE WHEN Letter='A' THEN Data END) AS A,
    MAX(CASE WHEN Letter='B' THEN Data END) AS B,
    MAX(CASE WHEN Letter='C' THEN Data END) AS C,
    MAX(CASE WHEN Letter='D' THEN Data END) AS D,
    MAX(CASE WHEN Letter='E' THEN Data END) AS E,
    MAX(CASE WHEN Letter='F' THEN Data END) AS F,
    MAX(CASE WHEN Letter='G' THEN Data END) AS G,
    MAX(CASE WHEN Letter='H' THEN Data END) AS H,
    MAX(CASE WHEN Letter='J' THEN Data END) AS J,
    MAX(CASE WHEN Letter='K' THEN Data END) AS K,
    MAX(CASE WHEN Letter='L' THEN Data END) AS L,
    MAX(CASE WHEN Letter='M' THEN Data END) AS M
INTO 
    #Final3
FROM 
    ##table
GROUP BY
    ProductID,
    CountryCodeId,
    DataTypeID,
    Formula,
    Yr
ORDER BY 
    ProductID,
    CountryCodeID,
    Yr;

这篇关于透视脚本的性能改进的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆