获取多个唯一值而不分离属于同一值块的值 [英] get a number of unique values without separating values that belong to the same block of values

查看:25
本文介绍了获取多个唯一值而不分离属于同一值块的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 PL/SQL 解决方案或 Access VBA/Excel VBA(尽管 Access VBA 比 Excel VBA 更受欢迎)都满意.所以,PL/SQL 是首选,Access VBA 次之,Excel VBA 次之.

I'm OK with either a PL/SQL solution or an Access VBA/Excel VBA (though Access VBA is preferred over Excel VBA) one. so, PL/SQL is the first choice, Access VBA is second and Excel VBA is third.

这是一个很难解释的问题.请提出任何问题,我会尽力清楚地回答.

This is a very tough problem to explain. Please ask any questions and i will do my best to answer them clearly.

我在名为 NR_PVO_120 的表中有以下数据集.我如何在不排除任何传真号码下的任何其他 ID 的情况下挑选一个编号(可以更改,但可以说是 6 个)的唯一其他 ID?

I have the following dataset in a table called NR_PVO_120. How do i pick out a number (which can change but let's say, 6) of UNIQUE OtherIDs without excluding any OtherIDs under any fax numbers?

因此,如果您从第 7 行中选择其他 ID,那么您还必须从第 8 行和第 9 行中选择其他 ID,因为它们具有相同的传真号码.基本上,一旦您选择了一个 OtherID,您就必须选择与您选择的传真号码具有相同传真号码的所有其他 ID.

So, if you pick OtherID from Row7 you then also must pick OtherIDs from rows 8 and 9 because they have the same fax number. Basically, once you pick an OtherID you're then obligated to pick all OtherIDs that have the same fax number as the one you picked.

如果请求的数字(本例中为 6)不可能,则规则是可能的最接近但不超过的数字".

If the number requested (6 for this example) isn't possible then "the closest number possible but not exceeding" would be the rule.

例如,如果您从第 1-10 行获取其他 ID,您将获得 6 个唯一的其他 ID,但第 10 行与第 11 和 12 行共享传真.您要么需要获取所有 3 个(但这会将唯一计数增加到 8,这是不可接受的)或跳过此 OtherID 并找到一个带有传真的传真,该传真将添加 1 个唯一的其他 ID(例如,它可以有 4 个其他 ID,但其中 3 个存在于结果集中,因此不会添加到唯一计数中).我的 6 UNIQUE OtherIDs 结果将需要包含现有 OtherIDs 连接到的任何传真下的所有 OtherIDs.

For example, if you take OtherIDs from rows 1-10 you will get 6 unique OtherIDs but row 10 shares a fax with rows 11 and 12. You either need to take all 3 (but that will raise the unique count to 8, which isn't acceptable) or skip this OtherID and find one with a fax that will add 1 unique OtherID (for example, it can have 4 OtherIDs but 3 of them exist on the result set and therefore don't add to unique counts). My result of 6 UNIQUE OtherIDs will need to contain ALL OtherIDs under any fax the existing OtherIDs are connected to.

所以一种解决方案是取第 1-6、26 行.另一种是取第 1-4,10-14 行.还有更多,但你明白了.

So one solution is to take rows 1-6, 26. Another is to take rows 1-4,10-14. There are more but you get the idea.

会有很多种可能(真实数据集有几万行,请求的人数会在10K左右),只要结果集上所有与所有传真相关的OtherID都是请求人数的一部分(6在这种情况下)任何组合都可以.

There will be many possibilities (the real dataset has tens of thousands of rows and the number of people requested will be around 10K), as long all OtherIDs connected to all faxes on the result set are part of the requested number (6 in this case) any combination would do.

一些注意事项.

  1. 尽可能接近请求的数字是一项要求.

  1. Getting as close as possible to the requested number is a requirement.

某些其他 ID 的传真将是空白的,它们只能作为最后的手段(对于请求的号码没有足够的其他 ID).

Some OtherIDs will have a blank fax, they should only be included as a last resort (not enough OtherIDs for the requested number).

这是怎么做到的?

Row      OtherID        Fax
1       11098554    2063504752
2       56200936    2080906666
3       11098554    7182160901
4       25138850    7182160901
5       56148974    7182232046
6       56530104    7182234134
7       25138850    7182234166
8       56148974    7182234166
9       11098554    7182234166
10      56597717    7182248132
11      56166294    7182248132
12      25138850    7182248132
13      56148974    7182390090
14      56226456    7182390090
15      56148974    7182395285
16      25138850    7182395285
17      56166614    7180930966
18      11098554    7180930966
19      56159509    7180930966
20      25138850    7185462234
21      56148974    7185462234
22      25138850    7185465013
23      56024315    7185465013
24      56115247    7185465281
25      25138850    7185465281
26      56148975    7185466029

一些示例输出

一种解决方案是取第 1-6 行和第 26 行.

one solution is taking rows 1-6 and 26.

Row      OtherID        Fax
1       11098554    2063504752
2       56200936    2080906666
3       11098554    7182160901
4       25138850    7182160901
5       56148974    7182232046
6       56530104    7182234134
26      56148975    7185466029

另一种解决方案是取第 1-4 行和第 10-14 行.

Another solution is taking rows 1-4 and 10-14.

Row      OtherID        Fax
1       11098554    2063504752
2       56200936    2080906666
3       11098554    7182160901
4       25138850    7182160901
10      56597717    7182248132
11      56166294    7182248132
12      25138850    7182248132
13      56148974    7182390090
14      56226456    7182390090

还有很多.

我只需要传真作为输出.

I only need FAX as my output.

这是针对传真活动的,我们需要确保没有传真号码被传真两次,所有与该传真号码相关的人都在发送的一份传真下得到联系.

This is for a fax campaign, we need to make sure no fax number is faxed twice, that all people connected to that fax number are contacted under one fax sent.

所以我们的想法是在您最终使用的任何传真下获取所有其他 ID.

So the idea is to take all OtherIDs under ANY fax you end up using.

编辑这里是它目前的完成方式,也许这有助于画一幅画

EDIT here's how it's currently done, maybe this helps paint a picture

列表按传真排序,他们沿着列表向下移动到一个随机点,以确保最后的记录以相同的传真结束.所以在我的例子中,他们会停在第 1、2、4、5、6、9、12、14、16、19、21、23、25、26 行中的任一行.然后他们会看到在那之前他们有多少唯一的 OtherID.如果太多,他们会上升一些,看看他们有多少.如果它太少,他们会减少一些,看看他们有多少.他们一直这样做,直到他们得到他们的唯一号码.唯一的要求是始终在传真下包含所有其他 ID.

list is sorted by fax, they go down the list to a random point MAKING SURE THE LAST RECORD ENDS WITH THE SAME FAX. so in my example they'd stop at either row 1,2,4,5,6,9,12,14,16,19,21,23,25,26. they then see how many unique OtherIDs they have up until that point. if it's too many they go up some, see how many they have. if it's too little, they go down some, see how many they have. and they keep doing this until they get their unique number. the only requirement is to always include all OtherIDs under a fax.

推荐答案

EDIT 2/13/2015在使用已接受的答案几个月后,我遇到了一个尚未发生的场景,并意识到他的解决方案只有在我需要得到一个不太接近总数的数字时才有效.例如,如果我的总记录数是 15000,而我要 12000,那么他的代码将给出 10 或 11k.如果我要 8k,那么我可能会得到 8.

EDIT 2/13/2015 after using the accepted answer for a few months i came across a scenario that hasn't happened yet and realized that his solution only works if i need to get a number that's not too close to the total. for example, if my total number of records is 15000 and i'm asking for 12000 then his code will give 10 or 11k. if i ask for 8k then i will probably get the 8.

我不明白他的代码是做什么的,他从来没有回复过,所以我无法解释为什么会发生这种情况,我的猜测是他按特定顺序进行计数,因为结果取决于传真的顺序被分类——他不一定每次都能得到最好的结果.当有足够的空间(要求 15k 中的 8l)时,他有足够的空间用于任何组合以产生可接受的结果,但是一旦您要求更紧的数字(15k 中的 12k),他就会被锁定在他的订单中并且足够快地用完可接受的计数.

i don't understand what his code does and he never replied so i can't explain why this is happening, my guess is that he's taking the counts in a certain order and since the results are dependent on the order the faxes are sorted in - he won't necessarily get the best results every time. when there's enough room (asking 8l out of 15k) he has enough room for any combination to yield the acceptable result but once you ask for a tighter number (12k out of 15k) he's locked into his order and runs out of acceptable counts fast enough.

所以这是无论如何都会给出正确结果的代码.它几乎没有那么优雅,而且速度非常慢,但它可以工作.

so this is the code that will give correct result no matter what. it's not nearly as elegant and is extremely slow but it works.

12/13/14 我想我明白了,PL/SQL,到目前为止还不是最好的解决方案,但它提供了比他们目前手工得到的更好的结果.实际上,很想听听可能的问题

12/13/14 i think i got it, PL/SQL, not the best solution by far but it gives better results than what they currently get by hand. actually, would be really interested to hear about possible problems

2014 年 12 月 13 日 EDIT 接受的答案是这样做的方法,我只是把它留在这里作为对比,所以人们可以看到如何不编码 lol.

12/13/14 EDIT the accepted answer is the way to do it, i'm only leaving this here for contrast, so people can see how not to code lol.

DECLARE
     CountsNeededTotal NUMBER;
     CountsNeededRemaining NUMBER;
     CurCountsTotal NUMBER;
     CurFaxCount NUMBER;
     CurFaxCountPicked NUMBER;
BEGIN
     CountsNeededTotal := 420;
     CurCountsTotal := 0;
     CurFaxCount := 0;

     CountsNeededRemaining := CountsNeededTotal - CurCountsTotal;

     EXECUTE IMMEDIATE 'TRUNCATE TABLE NR_PVO_121';


     --############################################################################################
     --############################################################################################
     --############################################################################################
     --############################################################################################
     --############################################################################################
     --START BLOCK
     --this block jsut gets the first fax, the fax with the largest number of people
     --############################################################################################
     --############################################################################################
     --############################################################################################
     --############################################################################################
     --############################################################################################

     --get the first fax with the most people as long as thta number isn't larger than the number needed
     SELECT MAX(CountOfPeople) CountOfPeople
    INTO CurFaxCount
    FROM (SELECT     fax
            ,COUNT(1) CountOfPeople
           FROM NR_PVO_120
          GROUP BY Fax
         HAVING COUNT(1) <= CountsNeededRemaining);

     COMMIT;

     --if there is a number that's not larger then add to the table and keep looping
     --if there isn't then there's no providers from this campaign that can be used
     IF CurFaxCount >= 0 THEN
       --insert into the 121 table (final list of faxes)
       INSERT INTO NR_PVO_121
         SELECT   fax
              ,COUNT(1) CountOfPeople
             FROM NR_PVO_120
           HAVING COUNT(1) = (SELECT MAX(CountOfPeople) CountOfPeople
                       FROM (SELECT   fax
                               ,COUNT(1) CountOfPeople
                              FROM NR_PVO_120
                          GROUP BY Fax
                            HAVING COUNT(1) <= CountsNeededTotal))
         GROUP BY Fax;



       COMMIT;

       --############################################################################################
       --############################################################################################
       --############################################################################################
       --############################################################################################
       --############################################################################################
       --START BLOCK
       --this block loops through remaining faxes
       --############################################################################################
       --############################################################################################
       --############################################################################################
       --############################################################################################
       --############################################################################################



       SELECT SUM(CountOfPeople) INTO CurCountsTotal FROM NR_PVO_121;


       IF CurCountsTotal < CountsNeededTotal THEN
         CountsNeededRemaining := CountsNeededTotal - CurCountsTotal;


         --loop until counts needed remaining is 0 or as close as 0 as possible without going in the negative
         WHILE CountsNeededRemaining >= 0 LOOP
              --clear 122 table
              EXECUTE IMMEDIATE 'TRUNCATE TABLE NR_PVO_122';


              --loop through all faxes in 120 table  MINUS the ones in the 121 table
              DECLARE
                CURSOR CurRec  IS
                  SELECT DISTINCT Fax
                    FROM NR_PVO_120
                   WHERE Fax NOT IN (SELECT Fax FROM NR_PVO_121);
                PVO CurRec%ROWTYPE;
              BEGIN
                OPEN CurRec;
                LOOP
                  FETCH CurRec INTO PVO;

                  SELECT DISTINCT COUNT(OtherID) CountOfPeople
                    INTO CurFaxCount
                    FROM NR_PVO_120
                   WHERE     Fax = PVO.fax
                      AND OtherID NOT IN (SELECT DISTINCT OtherID
                                   FROM NR_PVO_120
                                  WHERE fax IN (SELECT Fax FROM NR_PVO_121));
                  --                                                          DBMS_OUTPUT.put_line('CurFaxCount ' || CurFaxCount);
                  --                                                          DBMS_OUTPUT.put_line('CountsNeededRemaining ' || CountsNeededRemaining);

                  IF CurFaxCount <= CountsNeededRemaining THEN
                    --record their unique counts in 122 table IF THEY'RE NOT LARGER THAN CountsNeededRemaining
                    INSERT INTO NR_PVO_122
                         SELECT PVO.fax
                            ,CurFaxCount
                        FROM DUAL;

                    COMMIT;
                  END IF;
                  EXIT WHEN CurRec%NOTFOUND;
                --end fax loop
                END LOOP;
                CLOSE CurRec;
              END;


              --pick the highest count from 122 table
              SELECT MAX(CountOfPeople) CountOfPeople INTO CurFaxCountPicked FROM NR_PVO_122;

              --add this fax to the 121 table
              INSERT INTO NR_PVO_121
                SELECT MIN(Fax) Fax
                   ,CurFaxCountPicked
                  FROM NR_PVO_122
                 WHERE CountOfPeople = CurFaxCountPicked;


              COMMIT;
              --add the counts to the CurCountsTotal
              CurCountsTotal := CurCountsTotal + CurFaxCountPicked;
              --recalc   CountsNeededRemaining
              CountsNeededRemaining := CountsNeededTotal - CurCountsTotal;
              --
              --                                                          DBMS_OUTPUT.put_line('CurCountsTotal ' || CurCountsTotal);
              --                                                          DBMS_OUTPUT.put_line('CurFaxCountPicked ' || CurFaxCountPicked);
              --                                                          DBMS_OUTPUT.put_line('CurFaxCount ' || CurFaxCount);
              --                                                          DBMS_OUTPUT.put_line('CountsNeededRemaining ' || CountsNeededRemaining);
              --                                                          DBMS_OUTPUT.put_line('CountsNeededTotal ' || CountsNeededTotal);

              --clear 122 table
              EXECUTE IMMEDIATE 'TRUNCATE TABLE NR_PVO_122';
         --end while loop
         END LOOP;
       END IF;
     --############################################################################################
     --############################################################################################
     --############################################################################################
     --############################################################################################
     --############################################################################################
     --END BLOCK
     --this block loops through remaining faxes
     --############################################################################################
     --############################################################################################
     --############################################################################################
     --############################################################################################
     --############################################################################################



     END IF;
--############################################################################################
--############################################################################################
--############################################################################################
--############################################################################################
--############################################################################################
--END BLOCK
--this block jsut gets the first fax, the fax with the largest number of people
--############################################################################################
--############################################################################################
--############################################################################################
--############################################################################################
--############################################################################################



END;

这是一个更好的版本,比上面的要快得多,但在某些情况下它可能不会返回完美的结果.我在测试时无法得到错误的结果,但有一种可能性,因为我没有尝试所有可能的组合(如在第一个版本中),这需要几天时间才能完成 20K 记录的数据集

here's a better version, MUCH faster than the above but it probably won't return perfect results in some cases. i wasn't able to get wrong results while testing but there is a possibility because i'm not trying every possible combination (as in the first version), that takes days to finish for a dataset of 20K records

DECLARE
    CountsNeededTotal NUMBER;
    CountsNeededRemaining NUMBER;
    CurCountsTotal NUMBER;
BEGIN
    CurCountsTotal := 0;

    SELECT NoOfProvToKeep INTO CountsNeededTotal FROM NR_PVO_121;

    CountsNeededRemaining := CountsNeededTotal - CurCountsTotal;

    EXECUTE IMMEDIATE 'TRUNCATE TABLE nr_pvo_122';


    COMMIT;

    IF CurCountsTotal <= CountsNeededTotal THEN
        --loop until counts needed remaining is 0 or as close as 0 as possible without going in the negative
        WHILE CountsNeededRemaining > 0 LOOP
            --clear 122 table
            INSERT INTO NR_PVO_122
                SELECT Fax
                      ,CountOfPeople
                  FROM (SELECT   DISTINCT COUNT(OtherID) CountOfPeople
                               ,Fax
                       FROM NR_PVO_120
                      WHERE OtherID NOT IN (SELECT DISTINCT OtherID
                                    FROM NR_PVO_120
                                   WHERE fax IN (SELECT Fax FROM NR_PVO_122))
                     HAVING COUNT(1) <= CountsNeededRemaining
                        GROUP BY fax
                        ORDER BY 1 DESC)
                 WHERE ROWNUM = 1;



            SELECT SUM(CountOfPeople) INTO CurCountsTotal FROM NR_PVO_122;

            COMMIT;
            --recalc   CountsNeededRemaining
            CountsNeededRemaining := CountsNeededTotal - CurCountsTotal;
        --
        --DBMS_OUTPUT.put_line('CurCountsTotal ' || CurCountsTotal || ', CountsNeededRemaining ' || CountsNeededRemaining);
        --end while loop
        END LOOP;
    END IF;



    DELETE FROM NR_PVO_112
          WHERE NVL(Fax, '999999999999') NOT IN (SELECT Fax FROM NR_PVO_122);
END;

这篇关于获取多个唯一值而不分离属于同一值块的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆