在 SQL 中生成所有组合 [英] Generate all combinations in SQL

查看:32
本文介绍了在 SQL 中生成所有组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在给定的大小 @n 集合中生成大小 @k 的所有组合.有人可以请查看以下 SQL 并首先确定以下逻辑是否返回预期结果,其次是否有更好的方法?

/*CREATE FUNCTION dbo.Factorial ( @x int )返回整数作为开始声明@value int如果@x <= 1设置@值= 1别的SET @value = @x * dbo.Factorial( @x - 1 )返回@值结尾去*/设置无计数;声明@k int = 5, @n int;声明 @set table ( [value] varchar(24) );声明@com 表( [index] int );INSERT @set VALUES ('1'),('2'),('3'),('4'),('5'),('6');SELECT @n = COUNT(*) FROM @set;声明@combinations int = dbo.Factorial(@n)/(dbo.Factorial(@k) * dbo.Factorial(@n - @k));PRINT CAST(@combinations as varchar(max)) + '组合';声明@index int = 1;而@index <= @combinations开始插入@com 值(@index)SET @index = @index + 1结尾;WITH [设置] 为 (选择[价值],ROW_NUMBER() OVER ( ORDER BY [value] ) 作为 [index]从@set)选择[值].[值],[索引].[索引]为[组合]从 [设置] [值]交叉连接@com [索引]WHERE ([index].[index] + [values].[index] - 1) % (@n) BETWEEN 1 AND @k订购者[索引].[索引];

解决方案

返回组合

使用数字表或数字生成CTE,选择0到2^n - 1.使用这些数字中包含1的位位置来指示组合中相关成员的存在或不存在,并排除那些不存在的'没有正确数量的值,您应该能够返回包含您想要的所有组合的结果集.

WITH Nums (Num) AS (选择号码发件人号码WHERE Num BETWEEN 0 和 POWER(2, @n) - 1), BaseSet AS (SELECT ind = Power(2, Row_Number() OVER (ORDER BY Value) - 1), *从@set), 组合 AS (选择ComboID = N.Num,S.值,Cnt = Count(*) OVER (PARTITION BY N.Num)从数字 NINNER JOIN BaseSet S ON N.Num &S.ind <>0)选择组合ID,价值从组合哪里 Cnt = @k按 ComboID 排序,值;

这个查询执行得很好,但我想到了一种优化它的方法,从.我达到了帖子字符限制,无法将其包含在此处.(有什么想法可以把它放在哪里?)为了对照我的第一个版本的性能结果,这里的格式与以前相同:

埃里克

Items CPU Duration Reads Writes |CPU 持续时间读取----- ----- -------- ------- ------ |----- -------- -------17•5 354 378 12382 0 |18•9 1849 1893 97246 0 |5.2 5.0 7.920•10 7119 7357 369518 0 |20.1 19.5 29.821•11 13531 13807 705438 0 |38.2 36.5 57.021•20 3234 3295 48 0 |9.1 8.7 0.0

彼得

Items CPU Duration Reads Writes |CPU 持续时间读取----- ----- -------- ------- ------ |----- -------- -------17•5 41 45 6433 0 |18•9 2051 1522 214021 0 |50.0 33.8 33.320•10 8271 6685 864455 0 |201.7 148.6 134.421•11 18823 15502 2097909 0 |459.1 344.5 326.121•20 25688 17653 4195863 0 |626.5 392.3 652.2

结论

  • 动态数字表比包含行的真实表要好,因为以大量行数读取一个表需要大量 I/O.最好用一点 CPU.
  • 我最初的测试不够广泛,无法真正显示两个版本的性能特征.
  • Peter 的版本可以通过使每个 JOIN 不仅大于前一个项目来改进,而且还根据集合中必须容纳多少项目来限制最大值.例如,在一次取 21 条 21 项时,只有 21 行的一个答案(所有 21 项,一次),但是动态 SQL 版本中的中间行集,在执行计划的早期,包含诸如 " 之类的组合;AU"在第 2 步,即使这将在下一个连接中被丢弃,因为没有高于U"的值.可用的.类似地,第 5 步的中间行集将包含ARSTU".但此时唯一有效的组合是ABCDE".这个改进的版本在中心不会有一个较低的峰值,所以可能不会改进到足以成为明显的赢家,但它至少会变得对称,这样图表就不会在超过该区域的中间时保持最大值,而是会回落像我的版本一样接近 0(查看每个查询的峰值的顶角).

持续时间分析

  • 直到 14 个项目一次 12 个,版本之间的持续时间(> 100 毫秒)没有真正显着的差异.到目前为止,我的版本赢了 30 次,动态 SQL 版本赢了 43 次.
  • 从 14•12 开始,我的版本快了 65 倍(59 > 100 毫秒),动态 SQL 版本快了 64 倍(60 > 100 毫秒).然而,在我的版本更快的情况下,它平均节省了 256.5 秒的总持续时间,而当动态 SQL 版本更快时,它节省了 80.2 秒.
  • 所有试验的总平均持续时间为 Erik 270.3 秒,Peter 446.2 秒.
  • 如果创建了一个查找表来确定要使用的版本(为输入选择更快的版本),则所有结果都可以在 188.7 秒内完成.每次使用最慢的一次需要 527.7 秒.

阅读分析

持续时间分析显示我的查询以显着但不过量的优势获胜.当指标切换为读取时,出现了截然不同的情况——我的查询平均使用 1/10 的读取.

  • 在一次读取 9 个项目之前,读取 (>1000) 的版本之间没有真正显着的差异.到目前为止,我的版本赢了 30 次,动态 SQL 版本赢了 17 次.
  • 从 9•9 开始,我的版本使用了更少的读取 118 次 (113 > 1000),动态 SQL 版本使用了 69 次 (31 > 1000).但是,我的版本一直使用较少的读取,平均总共节省了 7590 万次读取,而当动态 SQL 版本更快时,它节省了 38 万次读取.
  • 所有试验的总平均读数为 Erik 8.4M,Peter 84M.
  • 如果创建了一个查找表来确定要使用的版本(为输入选择最好的版本),则所有结果都可以在 8M 读取中执行.每次使用最差的一次将需要 84.3M 读取.

我很想看到更新的动态 SQL 版本的结果,该版本对如上所述的每个步骤中选择的项目设置了额外的上限.

附录

我的查询的以下版本比上面列出的性能结果提高了大约 2.25%.我使用了 MIT 的 HAKMEM 位计数方法,并在 row_number() 的结果上添加了一个 Convert(int) 因为它返回一个 bigint.当然,我希望这是我用于上述所有性能测试以及图表和数据的版本,但我不太可能重做,因为它是劳动密集型的.

WITH L0 AS (SELECT 1 N UNION ALL SELECT 1),L1 AS(从 L0 中选择 1 N,L0 B),L2 AS(从 L1 中选择 1 N,L1 B),L3 AS(从 L2 中选择 1 N,L2 B),L4 AS(从 L3、L3 B 中选择 1 N),L5 AS(从 L4、L4 B 中选择 1 N),Nums AS (SELECT Row_Number() OVER(ORDER BY (SELECT 1)) Num FROM L5),Nums1 AS (SELECT Convert(int, Num) Num发件人号码WHERE Num BETWEEN 1 AND Power(2, @N) - 1), Nums2 AS (选择数,P1 = Num - ((Num/2) & 0xDB6DB6DB) - ((Num/4) & 0x49249249)从 Nums1),Nums3 AS (SELECT Num, Bits = ((P1 + P1/8) & 0xC71C71C7) % 63 FROM Nums2),BaseSet AS (SELECT Ind = Power(2, Row_Number() OVER (ORDER BY Value) - 1), * FROM #Set)选择N.Num,价值从数字3 NINNER JOIN BaseSet SON N.Num &S.Ind <>0在哪里位 = @K

我忍不住再展示一个版本,它通过查找来获取位数.它甚至可能比其他版本更快:

DECLARE @BitCounts binary(255) =0x01010201020203010202030203030401020203020303040203030403040405+ 0x0102020302030304020303040304040502030304030404050304040504050506+ 0x0102020302030304020303040304040502030304030404050304040504050506+ 0x0203030403040405030404050405050603040405040505060405050605060607+ 0x0102020302030304020303040304040502030304030404050304040504050506+ 0x0203030403040405030404050405050603040405040505060405050605060607+ 0x0203030403040405030404050405050603040405040505060405050605060607+ 0x0304040504050506040505060506060704050506050606070506060706070708;WITH L0 AS (SELECT 1 N UNION ALL SELECT 1),L1 AS(从 L0 中选择 1 N,L0 B),L2 AS(从 L1 中选择 1 N,L1 B),L3 AS(从 L2 中选择 1 N,L2 B),L4 AS(从 L3、L3 B 中选择 1 N),L5 AS(从 L4、L4 B 中选择 1 N),Nums AS (SELECT Row_Number() OVER(ORDER BY (SELECT 1)) Num FROM L5),Nums1 AS (SELECT Convert(int, Num) Num FROM Nums WHERE Num BETWEEN 1 AND Power(2, @N) - 1),BaseSet AS (SELECT Ind = Power(2, Row_Number() OVER (ORDER BY Value) - 1), * FROM ComboSet)选择@Combination = N.Num,@Value = S.Value从数字1 NINNER JOIN BaseSet SON N.Num &S.Ind <>0在哪里@K =Convert(int, Substring(@BitCounts, N.Num & 0xFF, 1))+ Convert(int, Substring(@BitCounts, N.Num/256 & 0xFF, 1))+ Convert(int, Substring(@BitCounts, N.Num/65536 & 0xFF, 1))+ Convert(int, Substring(@BitCounts, N.Num/16777216, 1))

I need to generate all combinations of size @k in a given set of size @n. Can someone please review the following SQL and determine first if the following logic is returning the expected results, and second if is there a better way?

/*CREATE FUNCTION dbo.Factorial ( @x int ) 
RETURNS int 
AS
BEGIN
    DECLARE @value int

    IF @x &lt;= 1
        SET @value = 1
    ELSE
        SET @value = @x * dbo.Factorial( @x - 1 )

    RETURN @value
END
GO*/
SET NOCOUNT ON;
DECLARE @k int = 5, @n int;
DECLARE @set table ( [value] varchar(24) );
DECLARE @com table ( [index] int );

INSERT @set VALUES ('1'),('2'),('3'),('4'),('5'),('6');

SELECT @n = COUNT(*) FROM @set;

DECLARE @combinations int = dbo.Factorial(@n) / (dbo.Factorial(@k) * dbo.Factorial(@n - @k));

PRINT CAST(@combinations as varchar(max)) + ' combinations';

DECLARE @index int = 1;

WHILE @index &lt;= @combinations
BEGIN
    INSERT @com VALUES (@index)
    SET @index = @index + 1
END;

WITH [set] as (
    SELECT 
        [value], 
        ROW_NUMBER() OVER ( ORDER BY [value] ) as [index]
    FROM @set
)
SELECT 
    [values].[value], 
    [index].[index] as [combination]
FROM [set] [values]
CROSS JOIN @com [index]
WHERE ([index].[index] + [values].[index] - 1) % (@n) BETWEEN 1 AND @k
ORDER BY
    [index].[index];

解决方案

Returning Combinations

Using a numbers table or number-generating CTE, select 0 through 2^n - 1. Using the bit positions containing 1s in these numbers to indicate the presence or absence of the relative members in the combination, and eliminating those that don't have the correct number of values, you should be able to return a result set with all the combinations you desire.

WITH Nums (Num) AS (
   SELECT Num
   FROM Numbers
   WHERE Num BETWEEN 0 AND POWER(2, @n) - 1
), BaseSet AS (
   SELECT ind = Power(2, Row_Number() OVER (ORDER BY Value) - 1), *
   FROM @set
), Combos AS (
   SELECT
      ComboID = N.Num,
      S.Value,
      Cnt = Count(*) OVER (PARTITION BY N.Num)
   FROM
      Nums N
      INNER JOIN BaseSet S ON N.Num & S.ind <> 0
)
SELECT
   ComboID,
   Value
FROM Combos
WHERE Cnt = @k
ORDER BY ComboID, Value;

This query performs pretty well, but I thought of a way to optimize it, cribbing from the Nifty Parallel Bit Count to first get the right number of items taken at a time. This performs 3 to 3.5 times faster (both CPU and time):

WITH Nums AS (
   SELECT Num, P1 = (Num & 0x55555555) + ((Num / 2) & 0x55555555)
   FROM dbo.Numbers
   WHERE Num BETWEEN 0 AND POWER(2, @n) - 1
), Nums2 AS (
   SELECT Num, P2 = (P1 & 0x33333333) + ((P1 / 4) & 0x33333333)
   FROM Nums
), Nums3 AS (
   SELECT Num, P3 = (P2 & 0x0f0f0f0f) + ((P2 / 16) & 0x0f0f0f0f)
   FROM Nums2
), BaseSet AS (
   SELECT ind = Power(2, Row_Number() OVER (ORDER BY Value) - 1), *
   FROM @set
)
SELECT
   ComboID = N.Num,
   S.Value
FROM
   Nums3 N
   INNER JOIN BaseSet S ON N.Num & S.ind <> 0
WHERE P3 % 255 = @k
ORDER BY ComboID, Value;

I went and read the bit-counting page and think that this could perform better if I don't do the % 255 but go all the way with bit arithmetic. When I get a chance I'll try that and see how it stacks up.

My performance claims are based on the queries run without the ORDER BY clause. For clarity, what this code is doing is counting the number of set 1-bits in Num from the Numbers table. That's because the number is being used as a sort of indexer to choose which elements of the set are in the current combination, so the number of 1-bits will be the same.

I hope you like it!

For the record, this technique of using the bit pattern of integers to select members of a set is what I've coined the "Vertical Cross Join." It effectively results in the cross join of multiple sets of data, where the number of sets & cross joins is arbitrary. Here, the number of sets is the number of items taken at a time.

Actually cross joining in the usual horizontal sense (of adding more columns to the existing list of columns with each join) would look something like this:

SELECT
   A.Value,
   B.Value,
   C.Value
FROM
   @Set A
   CROSS JOIN @Set B
   CROSS JOIN @Set C
WHERE
   A.Value = 'A'
   AND B.Value = 'B'
   AND C.Value = 'C'

My queries above effectively "cross join" as many times as necessary with only one join. The results are unpivoted compared to actual cross joins, sure, but that's a minor matter.

Critique of Your Code

First, may I suggest this change to your Factorial UDF:

ALTER FUNCTION dbo.Factorial (
   @x bigint
)
RETURNS bigint
AS
BEGIN
   IF @x <= 1 RETURN 1
   RETURN @x * dbo.Factorial(@x - 1)
END

Now you can calculate much larger sets of combinations, plus it's more efficient. You might even consider using decimal(38, 0) to allow larger intermediate calculations in your combination calculations.

Second, your given query does not return the correct results. For example, using my test data from the performance testing below, set 1 is the same as set 18. It looks like your query takes a sliding stripe that wraps around: each set is always 5 adjacent members, looking something like this (I pivoted to make it easier to see):

 1 ABCDE            
 2 ABCD            Q
 3 ABC            PQ
 4 AB            OPQ
 5 A            NOPQ
 6             MNOPQ
 7            LMNOP 
 8           KLMNO  
 9          JKLMN   
10         IJKLM    
11        HIJKL     
12       GHIJK      
13      FGHIJ       
14     EFGHI        
15    DEFGH         
16   CDEFG          
17  BCDEF           
18 ABCDE            
19 ABCD            Q

Compare the pattern from my queries:

 31 ABCDE  
 47 ABCD F 
 55 ABC EF 
 59 AB DEF 
 61 A CDEF 
 62  BCDEF 
 79 ABCD  G
 87 ABC E G
 91 AB DE G
 93 A CDE G
 94  BCDE G
103 ABC  FG
107 AB D FG
109 A CD FG
110  BCD FG
115 AB  EFG
117 A C EFG
118  BC EFG
121 A  DEFG
...

Just to drive the bit-pattern -> index of combination thing home for anyone interested, notice that 31 in binary = 11111 and the pattern is ABCDE. 121 in binary is 1111001 and the pattern is A__DEFG (backwards mapped).

Performance Results With A Real Numbers Table

I ran some performance testing with big sets on my second query above. I do not have a record at this time of the server version used. Here's my test data:

DECLARE
   @k int,
   @n int;

DECLARE @set TABLE (value varchar(24));
INSERT @set VALUES ('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('I'),('J'),('K'),('L'),('M'),('N'),('O'),('P'),('Q');
SET @n = @@RowCount;
SET @k = 5;

DECLARE @combinations bigint = dbo.Factorial(@n) / (dbo.Factorial(@k) * dbo.Factorial(@n - @k));
SELECT CAST(@combinations as varchar(max)) + ' combinations', MaxNumUsedFromNumbersTable = POWER(2, @n);

Peter showed that this "vertical cross join" doesn't perform as well as simply writing dynamic SQL to actually do the CROSS JOINs it avoids. At the trivial cost of a few more reads, his solution has metrics between 10 and 17 times better. The performance of his query decreases faster than mine as the amount of work increases, but not fast enough to stop anyone from using it.

The second set of numbers below is the factor as divided by the first row in the table, just to show how it scales.

Erik

Items  CPU   Writes  Reads  Duration |  CPU  Writes  Reads Duration
----- ------ ------ ------- -------- | ----- ------ ------ --------
17•5    7344     0     3861    8531  |
18•9   17141     0     7748   18536  |   2.3          2.0      2.2
20•10  76657     0    34078   84614  |  10.4          8.8      9.9
21•11 163859     0    73426  176969  |  22.3         19.0     20.7
21•20 142172     0    71198  154441  |  19.4         18.4     18.1

Peter

Items  CPU   Writes  Reads  Duration |  CPU  Writes  Reads Duration
----- ------ ------ ------- -------- | ----- ------ ------ --------
17•5     422    70    10263     794  | 
18•9    6046   980   219180   11148  |  14.3   14.0   21.4    14.0
20•10  24422  4126   901172   46106  |  57.9   58.9   87.8    58.1
21•11  58266  8560  2295116  104210  | 138.1  122.3  223.6   131.3
21•20  51391     5  6291273   55169  | 121.8    0.1  613.0    69.5

Extrapolating, eventually my query will be cheaper (though it is from the start in reads), but not for a long time. To use 21 items in the set already requires a numbers table going up to 2097152...

Here is a comment I originally made before realizing that my solution would perform drastically better with an on-the-fly numbers table:

I love single-query solutions to problems like this, but if you're looking for the best performance, an actual cross-join is best, unless you start dealing with seriously huge numbers of combination. But what does anyone want with hundreds of thousands or even millions of rows? Even the growing number of reads don't seem too much of a problem, though 6 million is a lot and it's getting bigger fast...

Anyway. Dynamic SQL wins. I still had a beautiful query. :)

Performance Results with an On-The-Fly Numbers Table

When I originally wrote this answer, I said:

Note that you could use an on-the-fly numbers table, but I haven't tried it.

Well, I tried it, and the results were that it performed much better! Here is the query I used:

DECLARE @N int = 16, @K int = 12;

CREATE TABLE #Set (Value char(1) PRIMARY KEY CLUSTERED);
CREATE TABLE #Items (Num int);
INSERT #Items VALUES (@K);
INSERT #Set 
SELECT TOP (@N) V
FROM
   (VALUES ('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('I'),('J'),('K'),('L'),('M'),('N'),('O'),('P'),('Q'),('R'),('S'),('T'),('U'),('V'),('W'),('X'),('Y'),('Z')) X (V);
GO
DECLARE
   @N int = (SELECT Count(*) FROM #Set),
   @K int = (SELECT TOP 1 Num FROM #Items);

DECLARE @combination int, @value char(1);
WITH L0 AS (SELECT 1 N UNION ALL SELECT 1),
L1 AS (SELECT 1 N FROM L0, L0 B),
L2 AS (SELECT 1 N FROM L1, L1 B),
L3 AS (SELECT 1 N FROM L2, L2 B),
L4 AS (SELECT 1 N FROM L3, L3 B),
L5 AS (SELECT 1 N FROM L4, L4 B),
Nums AS (SELECT Row_Number() OVER(ORDER BY (SELECT 1)) Num FROM L5),
Nums1 AS (
   SELECT Num, P1 = (Num & 0x55555555) + ((Num / 2) & 0x55555555)
   FROM Nums
   WHERE Num BETWEEN 0 AND Power(2, @N) - 1
), Nums2 AS (
   SELECT Num, P2 = (P1 & 0x33333333) + ((P1 / 4) & 0x33333333)
   FROM Nums1
), Nums3 AS (
   SELECT Num, P3 = (P2 & 0x0F0F0F0F) + ((P2 / 16) & 0x0F0F0F0F)
   FROM Nums2
), BaseSet AS (
   SELECT Ind = Power(2, Row_Number() OVER (ORDER BY Value) - 1), *
   FROM #Set
)
SELECT
   @Combination = N.Num,
   @Value = S.Value
FROM
   Nums3 N
   INNER JOIN BaseSet S
      ON N.Num & S.Ind <> 0
WHERE P3 % 255 = @K;

Note that I selected the values into variables to reduce the time and memory needed to test everything. The server still does all the same work. I modified Peter's version to be similar, and removed unnecessary extras so they were both as lean as possible. The server version used for these tests is Microsoft SQL Server 2008 (RTM) - 10.0.1600.22 (Intel X86) Standard Edition on Windows NT 5.2 <X86> (Build 3790: Service Pack 2) (VM) running on a VM.

Below are charts showing the performance curves for values of N and K up to 21. The base data for them is in another answer on this page. The values are the result of 5 runs of each query at each K and N value, followed by throwing out the best and worst values for each metric and averaging the remaining 3.

Basically, my version has a "shoulder" (in the leftmost corner of the chart) at high values of N and low values of K that make it perform worse there than the dynamic SQL version. However, this stays fairly low and constant, and the central peak around N = 21 and K = 11 is much lower for Duration, CPU, and Reads than the dynamic SQL version.

I included a chart of the number of rows each item is expected to return so you can see how the query performs stacked up against how big a job it has to do.

Please see my additional answer on this page for the complete performance results. I hit the post character limit and could not include it here. (Any ideas where else to put it?) To put things in perspective against my first version's performance results, here's the same format as before:

Erik

Items  CPU  Duration  Reads  Writes |  CPU  Duration  Reads 
----- ----- -------- ------- ------ | ----- -------- -------
17•5    354      378   12382      0 |
18•9   1849     1893   97246      0 |   5.2      5.0     7.9
20•10  7119     7357  369518      0 |  20.1     19.5    29.8
21•11 13531    13807  705438      0 |  38.2     36.5    57.0
21•20  3234     3295     48       0 |   9.1      8.7     0.0

Peter

Items  CPU  Duration  Reads  Writes |  CPU  Duration  Reads 
----- ----- -------- ------- ------ | ----- -------- -------
17•5     41       45    6433      0 | 
18•9   2051     1522  214021      0 |  50.0     33.8    33.3
20•10  8271     6685  864455      0 | 201.7    148.6   134.4
21•11 18823    15502 2097909      0 | 459.1    344.5   326.1
21•20 25688    17653 4195863      0 | 626.5    392.3   652.2

Conclusions

  • On-the-fly numbers tables are better than a real table containing rows, since reading one at huge rowcounts requires a lot of I/O. It is better to use a little CPU.
  • My initial tests weren't broad enough to really show the performance characteristics of the two versions.
  • Peter's version could be improved by making each JOIN not only be greater than the prior item, but also restrict the maximum value based on how many more items have to be fit into the set. For example, at 21 items taken 21 at a time, there is only one answer of 21 rows (all 21 items, one time), but the intermediate rowsets in the dynamic SQL version, early in the execution plan, contain combinations such as "AU" at step 2 even though this will be discarded at the next join since there is no value higher than "U" available. Similarly, an intermediate rowset at step 5 will contain "ARSTU" but the only valid combo at this point is "ABCDE". This improved version would not have a lower peak at the center, so possibly not improving it enough to become the clear winner, but it would at least become symmetrical so that the charts would not stay maxed past the middle of the region but would fall back to near 0 as my version does (see the top corner of the peaks for each query).

Duration Analysis

  • There is no really significant difference between the versions in duration (>100ms) until 14 items taken 12 at a time. Up to this point, my version wins 30 times and the dynamic SQL version wins 43 times.
  • Starting at 14•12, my version was faster 65 times (59 >100ms), the dynamic SQL version 64 times (60 >100ms). However, all the times my version was faster, it saved a total averaged duration of 256.5 seconds, and when the dynamic SQL version was faster, it saved 80.2 seconds.
  • The total averaged duration for all trials was Erik 270.3 seconds, Peter 446.2 seconds.
  • If a lookup table were created to determine which version to use (picking the faster one for the inputs), all the results could be performed in 188.7 seconds. Using the slowest one each time would take 527.7 seconds.

Reads Analysis

The duration analysis showed my query winning by significant but not overly large amount. When the metric is switched to reads, a very different picture emerges--my query uses on average 1/10th the reads.

  • There is no really significant difference between the versions in reads (>1000) until 9 items taken 9 at a time. Up to this point, my version wins 30 times and the dynamic SQL version wins 17 times.
  • Starting at 9•9, my version used fewer reads 118 times (113 >1000), the dynamic SQL version 69 times (31 >1000). However, all the times my version used fewer reads, it saved a total averaged 75.9M reads, and when the dynamic SQL version was faster, it saved 380K reads.
  • The total averaged reads for all trials was Erik 8.4M, Peter 84M.
  • If a lookup table were created to determine which version to use (picking the best one for the inputs), all the results could be performed in 8M reads. Using the worst one each time would take 84.3M reads.

I would be very interested to see the results of an updated dynamic SQL version that puts the extra upper limit on the items chosen at each step as I described above.

Addendum

The following version of my query achieves an improvement of about 2.25% over the performance results listed above. I used MIT's HAKMEM bit-counting method, and added a Convert(int) on the result of row_number() since it returns a bigint. Of course I wish this is the version I had used with for all the performance testing and charts and data above, but it is unlikely I will ever redo it as it was labor-intensive.

WITH L0 AS (SELECT 1 N UNION ALL SELECT 1),
L1 AS (SELECT 1 N FROM L0, L0 B),
L2 AS (SELECT 1 N FROM L1, L1 B),
L3 AS (SELECT 1 N FROM L2, L2 B),
L4 AS (SELECT 1 N FROM L3, L3 B),
L5 AS (SELECT 1 N FROM L4, L4 B),
Nums AS (SELECT Row_Number() OVER(ORDER BY (SELECT 1)) Num FROM L5),
Nums1 AS (
   SELECT Convert(int, Num) Num
   FROM Nums
   WHERE Num BETWEEN 1 AND Power(2, @N) - 1
), Nums2 AS (
   SELECT
      Num,
      P1 = Num - ((Num / 2) & 0xDB6DB6DB) - ((Num / 4) & 0x49249249)
   FROM Nums1
),
Nums3 AS (SELECT Num, Bits = ((P1 + P1 / 8) & 0xC71C71C7) % 63 FROM Nums2),
BaseSet AS (SELECT Ind = Power(2, Row_Number() OVER (ORDER BY Value) - 1), * FROM #Set)
SELECT
   N.Num,
   S.Value
FROM
   Nums3 N
   INNER JOIN BaseSet S
      ON N.Num & S.Ind <> 0
WHERE
   Bits = @K

And I could not resist showing one more version that does a lookup to get the count of bits. It may even be faster than other versions:

DECLARE @BitCounts binary(255) = 
   0x01010201020203010202030203030401020203020303040203030403040405
   + 0x0102020302030304020303040304040502030304030404050304040504050506
   + 0x0102020302030304020303040304040502030304030404050304040504050506
   + 0x0203030403040405030404050405050603040405040505060405050605060607
   + 0x0102020302030304020303040304040502030304030404050304040504050506
   + 0x0203030403040405030404050405050603040405040505060405050605060607
   + 0x0203030403040405030404050405050603040405040505060405050605060607
   + 0x0304040504050506040505060506060704050506050606070506060706070708;
WITH L0 AS (SELECT 1 N UNION ALL SELECT 1),
L1 AS (SELECT 1 N FROM L0, L0 B),
L2 AS (SELECT 1 N FROM L1, L1 B),
L3 AS (SELECT 1 N FROM L2, L2 B),
L4 AS (SELECT 1 N FROM L3, L3 B),
L5 AS (SELECT 1 N FROM L4, L4 B),
Nums AS (SELECT Row_Number() OVER(ORDER BY (SELECT 1)) Num FROM L5),
Nums1 AS (SELECT Convert(int, Num) Num FROM Nums WHERE Num BETWEEN 1 AND Power(2, @N) - 1),
BaseSet AS (SELECT Ind = Power(2, Row_Number() OVER (ORDER BY Value) - 1), * FROM ComboSet)
SELECT
   @Combination = N.Num,
   @Value = S.Value
FROM
   Nums1 N
   INNER JOIN BaseSet S
      ON N.Num & S.Ind <> 0
WHERE
   @K =
      Convert(int, Substring(@BitCounts, N.Num & 0xFF, 1))
      + Convert(int, Substring(@BitCounts, N.Num / 256 & 0xFF, 1))
      + Convert(int, Substring(@BitCounts, N.Num / 65536 & 0xFF, 1))
      + Convert(int, Substring(@BitCounts, N.Num / 16777216, 1))

这篇关于在 SQL 中生成所有组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆