SQL,辅助数字表 [英] SQL, Auxiliary table of numbers

查看:23
本文介绍了SQL,辅助数字表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于某些类型的 sql 查询,数字辅助表可能非常有用.它可以创建为包含特定任务所需行数的表,也可以创建为返回每个查询所需行数的用户定义函数.

For certain types of sql queries, an auxiliary table of numbers can be very useful. It may be created as a table with as many rows as you need for a particular task or as a user defined function that returns the number of rows required in each query.

创建这样一个函数的最佳方式是什么?

What is the optimal way to create such a function?

推荐答案

嘿...抱歉我这么晚才回复旧帖子.而且,是的,我必须做出回应,因为该线程上最流行的答案(当时,带有 14 种不同方法链接的递归 CTE 答案)是,嗯……性能受到了最好的挑战.

Heh... sorry I'm so late responding to an old post. And, yeah, I had to respond because the most popular answer (at the time, the Recursive CTE answer with the link to 14 different methods) on this thread is, ummm... performance challenged at best.

首先,具有 14 种不同解决方案的文章可以很好地了解动态创建数字/计数表的不同方法,但正如文章和引用的线程中所指出的那样,非常 重要引述...

First, the article with the 14 different solutions is fine for seeing the different methods of creating a Numbers/Tally table on the fly but as pointed out in the article and in the cited thread, there's a very important quote...

"关于效率和的建议性能往往是主观的.不管查询是如何进行的使用,物理实现决定查询的效率.因此,与其依赖有偏见的指导方针,势在必行您测试查询并确定哪个表现更好."

"suggestions regarding efficiency and performance are often subjective. Regardless of how a query is being used, the physical implementation determines the efficiency of a query. Therefore, rather than relying on biased guidelines, it is imperative that you test the query and determine which one performs better."

具有讽刺意味的是,文章本身包含许多主观陈述和有偏见的指导方针",例如递归 CTE 可以非常有效地"生成一个数字列表""这是使用来自 Itzik Ben-Gen 发布的新闻组中的 WHILE 循环的有效方法(我确定他发布的只是为了比较目的).来吧伙计们......仅仅提到Itzik的好名字可能会导致一些可怜的懒惰实际上使用这种可怕的方法.作者应该实践他所宣扬的东西,并且在做出如此荒谬的错误陈述之前应该做一些性能测试,尤其是在面对任何可扩展性时.

Ironically, the article itself contains many subjective statements and "biased guidelines" such as "a recursive CTE can generate a number listing pretty efficiently" and "This is an efficient method of using WHILE loop from a newsgroup posting by Itzik Ben-Gen" (which I'm sure he posted just for comparison purposes). C'mon folks... Just mentioning Itzik's good name may lead some poor slob into actually using that horrible method. The author should practice what (s)he preaches and should do a little performance testing before making such ridiculously incorrect statements especially in the face of any scalablility.

考虑到在对任何代码的功能或某人喜欢"的内容做出任何主观声明之前先进行一些测试,以下是一些您可以自己进行测试的代码.为您正在运行测试的 SPID 设置分析器并自行检查...只需对数字 1000000 执行Search'n'Replace"以获取您的收藏夹"号码,然后查看...

With the thought of actually doing some testing before making any subjective claims about what any code does or what someone "likes", here's some code you can do your own testing with. Setup profiler for the SPID you're running the test from and check it out for yourself... just do a "Search'n'Replace" of the number 1000000 for your "favorite" number and see...

--===== Test for 1000000 rows ==================================
GO
--===== Traditional RECURSIVE CTE method
   WITH Tally (N) AS 
        ( 
         SELECT 1 UNION ALL 
         SELECT 1 + N FROM Tally WHERE N < 1000000 
        ) 
 SELECT N 
   INTO #Tally1 
   FROM Tally 
 OPTION (MAXRECURSION 0);
GO
--===== Traditional WHILE LOOP method
 CREATE TABLE #Tally2 (N INT);
    SET NOCOUNT ON;
DECLARE @Index INT;
    SET @Index = 1;
  WHILE @Index <= 1000000 
  BEGIN 
         INSERT #Tally2 (N) 
         VALUES (@Index);
            SET @Index = @Index + 1;
    END;
GO
--===== Traditional CROSS JOIN table method
 SELECT TOP (1000000)
        ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS N
   INTO #Tally3
   FROM Master.sys.All_Columns ac1
  CROSS JOIN Master.sys.ALL_Columns ac2;
GO
--===== Itzik's CROSS JOINED CTE method
   WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
        E02(N) AS (SELECT 1 FROM E00 a, E00 b),
        E04(N) AS (SELECT 1 FROM E02 a, E02 b),
        E08(N) AS (SELECT 1 FROM E04 a, E04 b),
        E16(N) AS (SELECT 1 FROM E08 a, E08 b),
        E32(N) AS (SELECT 1 FROM E16 a, E16 b),
   cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32)
 SELECT N
   INTO #Tally4
   FROM cteTally
  WHERE N <= 1000000;
GO
--===== Housekeeping
   DROP TABLE #Tally1, #Tally2, #Tally3, #Tally4;
GO

在此期间,这是我从 SQL Profiler 获得的数值 100、1000、10000、100000 和 1000000...

While we're at it, here's the numbers I get from SQL Profiler for the values of 100, 1000, 10000, 100000, and 1000000...

SPID TextData                                 Dur(ms) CPU   Reads   Writes
---- ---------------------------------------- ------- ----- ------- ------
  51 --===== Test for 100 rows ==============       8     0       0      0
  51 --===== Traditional RECURSIVE CTE method      16     0     868      0
  51 --===== Traditional WHILE LOOP method CR      73    16     175      2
  51 --===== Traditional CROSS JOIN table met      11     0      80      0
  51 --===== Itzik's CROSS JOINED CTE method        6     0      63      0
  51 --===== Housekeeping   DROP TABLE #Tally      35    31     401      0

  51 --===== Test for 1000 rows =============       0     0       0      0
  51 --===== Traditional RECURSIVE CTE method      47    47    8074      0
  51 --===== Traditional WHILE LOOP method CR      80    78    1085      0
  51 --===== Traditional CROSS JOIN table met       5     0      98      0
  51 --===== Itzik's CROSS JOINED CTE method        2     0      83      0
  51 --===== Housekeeping   DROP TABLE #Tally       6    15     426      0

  51 --===== Test for 10000 rows ============       0     0       0      0
  51 --===== Traditional RECURSIVE CTE method     434   344   80230     10
  51 --===== Traditional WHILE LOOP method CR     671   563   10240      9
  51 --===== Traditional CROSS JOIN table met      25    31     302     15
  51 --===== Itzik's CROSS JOINED CTE method       24     0     192     15
  51 --===== Housekeeping   DROP TABLE #Tally       7    15     531      0

  51 --===== Test for 100000 rows ===========       0     0       0      0
  51 --===== Traditional RECURSIVE CTE method    4143  3813  800260    154
  51 --===== Traditional WHILE LOOP method CR    5820  5547  101380    161
  51 --===== Traditional CROSS JOIN table met     160   140     479    211
  51 --===== Itzik's CROSS JOINED CTE method      153   141     276    204
  51 --===== Housekeeping   DROP TABLE #Tally      10    15     761      0

  51 --===== Test for 1000000 rows ==========       0     0       0      0
  51 --===== Traditional RECURSIVE CTE method   41349 37437 8001048   1601
  51 --===== Traditional WHILE LOOP method CR   59138 56141 1012785   1682
  51 --===== Traditional CROSS JOIN table met    1224  1219    2429   2101
  51 --===== Itzik's CROSS JOINED CTE method     1448  1328    1217   2095
  51 --===== Housekeeping   DROP TABLE #Tally       8     0     415      0

如您所见,递归 CTE 方法在持续时间和 CPU 方面仅次于 While 循环,并且逻辑读取形式的内存压力是 While 循环的 8 倍.它是类固醇上的 RBAR,应该不惜一切代价避免任何单行计算,就像应该避免 While 循环一样.有些地方递归很有价值,但这不是其中之一.

As you can see, the Recursive CTE method is the second worst only to the While Loop for Duration and CPU and has 8 times the memory pressure in the form of logical reads than the While Loop. It's RBAR on steroids and should be avoided, at all cost, for any single row calculations just as a While Loop should be avoided. There are places where recursion is quite valuable but this ISN'T one of them.

作为侧边栏,Denny 先生绝对是……一个大小合适的永久数字或计数表是处理大多数事情的方法.正确大小是什么意思?好吧,大多数人使用 Tally 表来生成日期或对 VARCHAR(8000) 进行拆分.如果您创建一个 11,000 行的 Tally 表,并在N"上使用正确的聚集索引,您将有足够的行来创建超过 30 年的日期(我经常使用抵押贷款,所以 30 年对我来说是一个关键数字),当然足以处理 VARCHAR(8000) 拆分.为什么大小合适"如此重要?如果 Tally 表被大量使用,它很容易放入缓存中,这使得它非常快,而且根本不会对内存造成太大压力.

As a side bar, Mr. Denny is absolutely spot on... a correctly sized permanent Numbers or Tally table is the way to go for most things. What does correctly sized mean? Well, most people use a Tally table to generate dates or to do splits on VARCHAR(8000). If you create an 11,000 row Tally table with the correct clustered index on "N", you'll have enough rows to create more than 30 years worth of dates (I work with mortgages a fair bit so 30 years is a key number for me) and certainly enough to handle a VARCHAR(8000) split. Why is "right sizing" so important? If the Tally table is used a lot, it easily fits in cache which makes it blazingly fast without much pressure on memory at all.

最后但并非最不重要的一点是,每个人都知道,如果您创建了一个永久的 Tally 表,那么您使用哪种方法来构建它并不重要,因为 1) 它只会创建一次 2) 如果它是类似的一个 11,000 行的表,所有方法都将运行足够好".那么为什么我对使用哪种方法感到愤怒???

Last but not least, every one knows that if you create a permanent Tally table, it doesn't much matter which method you use to build it because 1) it's only going to be made once and 2) if it's something like an 11,000 row table, all of the methods are going to run "good enough". So why all the indigination on my part about which method to use???

答案是,一些不了解任何更好并且只需要完成他或她的工作的可怜的家伙/女孩可能会看到类似递归 CTE 方法的东西,并决定将其用于更大和更频繁的事情使用而不是构建一个永久的 Tally 表,我试图保护这些人、他们的代码运行的服务器以及拥有这些服务器上数据的公司.是的……这有什么大不了的.它也应该适用于其他所有人.教导正确的做事方式,而不是足够好".在发布或使用帖子或书籍中的内容之前进行一些测试……实际上,您挽救的生命可能是您自己的,尤其是如果您认为递归 CTE 是实现此类目标的方法.;-)

The answer is that some poor guy/gal who doesn't know any better and just needs to get his or her job done might see something like the Recursive CTE method and decide to use it for something much larger and much more frequently used than building a permanent Tally table and I'm trying to protect those people, the servers their code runs on, and the company that owns the data on those servers. Yeah... it's that big a deal. It should be for everyone else, as well. Teach the right way to do things instead of "good enough". Do some testing before posting or using something from a post or book... the life you save may, in fact, be your own especially if you think a recursive CTE is the way to go for something like this. ;-)

感谢收听...

这篇关于SQL,辅助数字表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆