SQL / VBScript中/智能算法的快速查找款项组合 [英] SQL / VBScript / Intelligent Algorithm to find sum combinations quickly

查看:840
本文介绍了SQL / VBScript中/智能算法的快速查找款项组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图列出所有可能的顺序(仅适用于连续和前进方向)之和的组合,在同一主题中。 清单出ROW_ID和参与的总和的行数

Am trying to list out all the possible sequential (continuous and forward direction only) sum combinations , within the same subject. Listing out the row_id and the number of rows involved in the sum.

Sample :

Input (Source Table :) 

DLID    Subject Total   
1   Science 70  
2   Science 70  
3   Science 70  
4   Science 70  
5   Maths   80  
6   Maths   80  
7   English 90  
8   English 90  
9   English 90  
10  Science 75  

Expected Result :           
ID  Number of Rows  Subject Total
1   1   Science 70
2   1   Science 70
3   1   Science 70
4   1   Science 70
5   1   Maths   80
6   1   Maths   80
7   1   English 90
8   1   English 90
9   1   English 90
10  1   Science 75
1   2   Science 140
2   2   Science 140
3   2   Science 140
5   2   Maths   160
7   2   English 180
8   2   English 180
1   3   Science 210
2   3   Science 210
7   3   English 270
1   4   Science 280

VBSript code:

VBSript Code :

myarray中 - 从access数据库中读取整个表 我是行的总数阅读 'J,如果由一个访问各行的一 M是具有同一主题后续行的数目,我们正试图检查 'n是一个计数器,从各行开始,检查高达米 - 1行是否相同的子 'k被用于对结果存储到resultarray

' myarray - reads the entire table from access database ' "i" is the total number of rows read ' "j" if to access each row one by one ' "m" is the number of subsequent rows with same subject , we are trying to check ' "n" is a counter to start from each row and check upto m - 1 rows whether same sub ' "k" is used to store the results into "resultarray"

myarray(0,j) = holds the row_id
myarray(1,j) = holds the subject
myarray(2,j) = holds the score
myarray(3 4 5 6 are other details
i is the total number of rows - around 80,000
There can be conitnuous records from the same subject as many as 700 - 800
m = is the number of rows matching / number of rows leading to the sum



For m = 1 to 700
For j = 0 to i-m
matchcount = 1

For n = 1 to m-1 
if  myarray(1,j) = myarray (1,j+n) Then 
matchcount = matchcount + 1
Else
Exit For
End If
Next

If matchcount = m Then
resultarray(2,k) = 0
For o = 0 to m - 1
resultarray(2,k) = CDbl(resultarray(2,k)) + CDbl (myarray (2,j+o))
resultarray(1,k) = m
resultarray(0,k) = ( myarray (0,j) )
resultarray(3,k) = ( myarray (3,j) )
resultarray(4,k) = ( myarray (4,j) )
resultarray(5,k) = ( myarray (1,j) )
resultarray(7,k) = ( myarray (5,j) )
resultarray(8,k) = ( myarray (6,j) )
Next
resultarray(2,k) = round(resultarray(2,k),0)
k = k + 1
ReDim Preserve resultarray(8,k)
End If

Next
Next

code是工作完美,但速度很慢。 正在处理80,000行并从5至900连续行相同的主题。 这样的组合的数量,来在几百万。 需要几个小时,一组80000行。必须每天做很多套。

Code is working perfect , but is very slow. Am dealing with 80,000 row and from 5 to 900 continuous rows of same subject. So the number of combinations , comes in a few millions. Takes few hours for one set of 80,000 rows. have to do many sets daily.

请建议如何加快这。 更好的算法/ code改进/不同的语言code 请帮助。

Please suggest how to speed this up. Better Algorithm / Code Improvements / Different Language to code Please assist.

推荐答案

下面是一个真正的访问(SQL)解决方案。

Here are the building blocks for a "real" Access (SQL) solution.

观察#1

在我看来,一个好的第一步是添加两个数字(长整型)列的[SourceTable]:

It seems to me that a good first step would be to add two Numeric (Long Integer) columns to the [SourceTable]:

[SubjectBlock]将行数的块,其中的主题是一致的。

[SubjectBlock] will number the "blocks" of rows where the subject is the same

[SubjectBlockSeq]将各块内按顺序编号的行

[SubjectBlockSeq] will sequentially number the rows within each block

他们都应该被索引(重复)。在code来填充这些列会...

They both should be indexed (Duplicates OK). The code to populate these columns would be...

Public Sub UpdateBlocksAndSeqs()
Dim cdb As DAO.Database, rst As DAO.Recordset
Dim BlockNo As Long, SeqNo As Long, PrevSubject As String

Set cdb = CurrentDb
Set rst = cdb.OpenRecordset("SELECT * FROM [SourceTable] ORDER BY [DLID]", dbOpenDynaset)

PrevSubject = "(an impossible value)"
BlockNo = 0
SeqNo = 0
DBEngine.Workspaces(0).BeginTrans  ''speeds up bulk updates
Do While Not rst.EOF
    If rst!Subject <> PrevSubject Then
        BlockNo = BlockNo + 1
        SeqNo = 0
    End If
    SeqNo = SeqNo + 1
    rst.Edit
    rst!SubjectBlock = BlockNo
    rst!SubjectBlockSeq = SeqNo
    rst.Update
    PrevSubject = rst!Subject
    rst.MoveNext
Loop
DBEngine.Workspaces(0).CommitTrans
rst.Close
Set rst = Nothing
End Sub

...和更新SourceTable会...

...and the updated SourceTable would be...

DLID    Subject   Total   SubjectBlock    SubjectBlockSeq
1       Science   70      1               1
2       Science   60      1               2
3       Science   75      1               3
4       Science   70      1               4
5       Maths     80      2               1
6       Maths     90      2               2
7       English   90      3               1
8       English   80      3               2
9       English   70      3               3
10      Science   75      4               1

(注意,我调整的测试数据,以使其更容易下面验证结果。)

(Note that I tweaked your test data to make it easier to verify the results below.)

现在,我们通过不断增加的迭代序列的长度应计入总我们可以迅速确定了块是感兴趣的只是通过查询像...

Now as we iterate through the ever-increasing "length of sequence to be included in the total" we can quickly identify the "blocks" that are of interest simply by using a query like...

SELECT SubjectBlock FROM SourceTable WHERE SubjectBlockSeq=3

...这将返回...

...which will return...

1
3

...表明计算总计为三跑的时候,我们将不再需要看块2(数学)和4个(最后一个科学的)都没有。

...indicating that when calculating the totals for a "run of 3" we won't need to look at blocks 2 ("Maths") and 4 (the last "Science" one) at all.

观察#2

第一次通过,当= numRows行1,是一个特例:它只是将文件从[SourceTable]到[预期结果表中的行。我们可以通过这样做,跟单的查询节省时间:

The first time through, when NumRows=1, is a special case: it just copies the rows from [SourceTable] into the [Expected Results] table. We can save time by doing that with a single query:

INSERT INTO ExpectedResult ( DLID, NumRows, Subject, Total, SubjectBlock, NextSubjectBlockSeq )
SELECT SourceTable.DLID, 1 AS Expr1, SourceTable.Subject, SourceTable.Total, 
    SourceTable.SubjectBlock, SourceTable.SubjectBlockSeq+1 AS Expr2
FROM SourceTable;

您可能会注意到,我已经添加了两列到[ExpectedResult表:[SubjectBlock](如前)和[NextSubjetBlockSeq](这只是[SubjectBlockSeq] +1)。此外,他们都应该被索引,从而重复。我们将使用在下面。

You may notice that I have added two columns to the [ExpectedResult] table: [SubjectBlock] (as before) and [NextSubjetBlockSeq] (which is just [SubjectBlockSeq]+1). Again, they should both be indexed, allowing duplicates. We'll use them below.

观察#3

,我们将继续寻找时间越来越长跑来概括,每次运行其实只是一个较早(较短)与上涨到年底的附加行运行。如果我们写我们的结果到[ExpectedResults表,因为我们走,我们可以重新使用这些值,而不是打扰回去和整个运行增加了个人价值。

As we continue looking for longer and longer "runs" to sum, each run is really just an earlier (shorter) run with an additional row tacked onto the end. If we write our results to the [ExpectedResults] table as we go along, we can re-use those values and not bother going back and adding up the individual values for the entire run.

在numRows行= 2,在附加行是那些其中 SubjectBlockSeq&GT; = 2 ...

When NumRows=2, the "add-on" rows are the ones where SubjectBlockSeq>=2...

SELECT SourceTable.*
FROM SourceTable
WHERE (((SourceTable.SubjectBlockSeq)>=2))
ORDER BY SourceTable.DLID;

...就是...

...that is...

DLID    Subject   Total SubjectBlock    SubjectBlockSeq
2       Science   60    1               2
3       Science   75    1               3
4       Science   70    1               4
6       Maths     90    2               2
8       English   80    3               2
9       English   70    3               3

...和[ExpectedResult]随着早期(短)运行上,我们将行套结的附加行是那些

...and the [ExpectedResult] rows with the "earlier (shorter) run" onto which we will be "tacking" the additional row are the ones

  • 从相同的[SubjectBlock]

  • from the same [SubjectBlock],

与[numRows行] = 1,以及

with [NumRows]=1, and

[ExpectedResult]。[NextSubjectBlockSeq] = [SourceTable]。[SubjectBlockSeq]

所以我们可以得到新的汇总,并将它们添加到[ExpectedResult]像这样

so we can get the new totals and append them to [ExpectedResult] like this

INSERT INTO ExpectedResult ( DLID, NumRows, Subject, Total, SubjectBlock, NextSubjectBlockSeq )
SELECT SourceTable.DLID, 2 AS Expr1, SourceTable.Subject, 
    [ExpectedResult].[Total]+[SourceTable].[Total] AS NewTotal, 
    SourceTable.SubjectBlock, [SourceTable].[SubjectBlockSeq]+1 AS Expr2
FROM SourceTable INNER JOIN ExpectedResult 
    ON (SourceTable.SubjectBlockSeq = ExpectedResult.NextSubjectBlockSeq) 
        AND (SourceTable.SubjectBlock = ExpectedResult.SubjectBlock)
WHERE (((SourceTable.SubjectBlockSeq)>=2) AND (ExpectedResult.NumRows=1));

添加到[ExpectedResult]中的行

The rows appended to [ExpectedResult] are

DLID    NumRows Subject   Total SubjectBlock    NextSubjectBlockSeq
2       2       Science   130   1               3
3       2       Science   135   1               4
4       2       Science   145   1               5
6       2       Maths     170   2               3
8       2       English   170   3               3
9       2       English   150   3               4

现在,我们乱打......

像以前一样使用同样的逻辑,我们现在可以处理为numRows行= 3。唯一的区别是,我们将插入值3到numRows行,我们的选择标准将是

Using the same logic as before, we can now process for NumRows=3. The only differences are that we will be inserting the value 3 into NumRows, and our selection criteria will be

WHERE (((SourceTable.SubjectBlockSeq)>=3) AND (ExpectedResult.NumRows=2))

完整的查询是

The complete query is

INSERT INTO ExpectedResult ( DLID, NumRows, Subject, Total, SubjectBlock, NextSubjectBlockSeq )
SELECT SourceTable.DLID, 3 AS Expr1, SourceTable.Subject, 
    [ExpectedResult].[Total]+[SourceTable].[Total] AS NewTotal, 
    SourceTable.SubjectBlock, [SourceTable].[SubjectBlockSeq]+1 AS Expr2
FROM SourceTable INNER JOIN ExpectedResult 
    ON (SourceTable.SubjectBlockSeq = ExpectedResult.NextSubjectBlockSeq) 
        AND (SourceTable.SubjectBlock = ExpectedResult.SubjectBlock)
WHERE (((SourceTable.SubjectBlockSeq)>=3) AND (ExpectedResult.NumRows=2));

和附加[ExpectedResult]中的行

and the rows appended to [ExpectedResult] are

DLID    NumRows Subject   Total SubjectBlock    NextSubjectBlockSeq
3       3       Science   205   1               4
4       3       Science   205   1               5
9       3       English   240   3               4

参数

由于每个连续查询是如此的相似,这将是非常不错的,如果我们可以只写一次,并多次使用它。幸运的是,我们可以,如果我们把它变成一个参数查询:

Since each successive query is so similar, it would be awfully nice if we could just write it once and use it repeatedly. Fortunately, we can, if we turn it into a "Parameter Query":

PARAMETERS TargetNumRows Long;
INSERT INTO ExpectedResult ( DLID, NumRows, Subject, Total, SubjectBlock, NextSubjectBlockSeq )
SELECT SourceTable.DLID, [TargetNumRows] AS Expr1, SourceTable.Subject, 
    [ExpectedResult].[Total]+[SourceTable].[Total] AS NewTotal, 
    SourceTable.SubjectBlock, [SourceTable].[SubjectBlockSeq]+1 AS Expr2
FROM SourceTable INNER JOIN ExpectedResult 
    ON (SourceTable.SubjectBlock = ExpectedResult.SubjectBlock) 
        AND (SourceTable.SubjectBlockSeq = ExpectedResult.NextSubjectBlockSeq)
WHERE (((SourceTable.SubjectBlockSeq)>=[TargetNumRows]) 
    AND ((ExpectedResult.NumRows)=[TargetNumRows]-1));

创建一个新的Access查询,粘贴到上面的SQL窗格中,然后将其保存为 pq_appendToExpectedResult 。 (以下简称pq_仅仅是一个视觉线索,它是一个参数查询。)

Create a new Access query, paste the above into the SQL pane, and then save it as pq_appendToExpectedResult. (The "pq_" is just a visual cue that it's a Parameter Query.)

从VBA调用参数查询

您可以调用(执行)参数查询在VBA通过QueryDef对象:

You can invoke (execute) a Parameter Query in VBA via a QueryDef object:

Dim cdb As DAO.Database, qdf As DAO.QueryDef
Set cdb = CurrentDb
Set qdf = cdb.QueryDefs("pq_appendToExpectedResult")
qdf!TargetNumRows = 4  '' parameter value
qdf.Execute
Set qdf = Nothing

何时停止

现在你可以看到,它只是递增的问题 numRows行并重新运行参数查询,但何时停止?这很简单:

Now you can see that it's simply a matter of incrementing NumRows and re-running the Parameter Query, but when to stop? That's easy:

递增的变量numRows行VBA中后,测试

After incrementing your NumRows variable in VBA, test

DCount("DLID", "SourceTable", "SubjectBlockSeq=" & NumRows)

如果它回来0,那么你就大功告成了。

If it comes back 0 then you're done.

显示我(全部)code

对不起,不是现在。 ;)一下这个,让我们知道如何去

Sorry, not right away. ;) Play around with this and let us know how it goes.

这篇关于SQL / VBScript中/智能算法的快速查找款项组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆