得到了一些独特的价值,而不分离值属于价值相同的块 [英] get a number of unique values without separating values that belong to the same block of values
问题描述
我与任何一个PL / SQL解决方案,或访问VBA / Excel的VBA OK(尽管访问VBA是pferred在Excel中VBA $ P $)之一。因此,PL / SQL是首选,进入VBA是第二和Excel VBA是第三个。
I'm OK with either a PL/SQL solution or an Access VBA/Excel VBA (though Access VBA is preferred over Excel VBA) one. so, PL/SQL is the first choice, Access VBA is second and Excel VBA is third.
这是一个非常棘手的问题来解释。请询问任何问题,我会尽我所能清楚地回答这些问题。
This is a very tough problem to explain. Please ask any questions and i will do my best to answer them clearly.
我在一个叫NR_PVO_120表中的以下数据集。我要如何挑选出一个数字(可以改变,但让我们说,6)独特的OtherIDs不排除在任何传真号码任何OtherIDs的?
I have the following dataset in a table called NR_PVO_120. How do i pick out a number (which can change but let's say, 6) of UNIQUE OtherIDs without excluding any OtherIDs under any fax numbers?
所以,如果你从Row7挑OtherID你那么还必须挑选行8和9 OtherIDs,因为它们具有相同的传真号码。基本上,一旦你选择了OtherID你再有义务挑选具有相同的传真号码,你挑一个都OtherIDs。
So, if you pick OtherID from Row7 you then also must pick OtherIDs from rows 8 and 9 because they have the same fax number. Basically, once you pick an OtherID you're then obligated to pick all OtherIDs that have the same fax number as the one you picked.
如果(对于此实施例6)要求的数量是不可能的,那么最接近的数目可能的,但不超过将是该规则。
If the number requested (6 for this example) isn't possible then "the closest number possible but not exceeding" would be the rule.
例如,如果从行1-10带OtherIDs您将获得6个独特的OtherIDs但排10股行11和12传真你要么需要全取3(但会提高独特的计数8 ,这是不能接受的)或跳过此OtherID并找到一个与一个传真,将增加1独特OtherID(例如,它可以具有4 OtherIDs但其中3上存在结果集,因此,不添加到唯一计数)。我的6个独特的OtherIDs结果将需要包含在现有OtherIDs连接到任何传真ALL OtherIDs。
For example, if you take OtherIDs from rows 1-10 you will get 6 unique OtherIDs but row 10 shares a fax with rows 11 and 12. You either need to take all 3 (but that will raise the unique count to 8, which isn't acceptable) or skip this OtherID and find one with a fax that will add 1 unique OtherID (for example, it can have 4 OtherIDs but 3 of them exist on the result set and therefore don't add to unique counts). My result of 6 UNIQUE OtherIDs will need to contain ALL OtherIDs under any fax the existing OtherIDs are connected to.
所以,一个解决办法是采取行1-6,26.另一个是走行1-4,10-14。还有更多的,但你的想法。
So one solution is to take rows 1-6, 26. Another is to take rows 1-4,10-14. There are more but you get the idea.
将有多种可能性(真实数据集有行数万人要求的数量将是10K左右),只要连接到结果集的所有传真都OtherIDs都要求数字的一部分(6在这种情况下)的任何组合会做
There will be many possibilities (the real dataset has tens of thousands of rows and the number of people requested will be around 10K), as long all OtherIDs connected to all faxes on the result set are part of the requested number (6 in this case) any combination would do.
这几个音符。
-
获得尽可能接近所请求的数目是必须的。
Getting as close as possible to the requested number is a requirement.
有些OtherIDs将有一个空白的传真,他们只应包括作为最后的手段(没有足够的OtherIDs请求的数量)。
Some OtherIDs will have a blank fax, they should only be included as a last resort (not enough OtherIDs for the requested number).
这是怎么做的?
Row OtherID Fax
1 11098554 2063504752
2 56200936 2080906666
3 11098554 7182160901
4 25138850 7182160901
5 56148974 7182232046
6 56530104 7182234134
7 25138850 7182234166
8 56148974 7182234166
9 11098554 7182234166
10 56597717 7182248132
11 56166294 7182248132
12 25138850 7182248132
13 56148974 7182390090
14 56226456 7182390090
15 56148974 7182395285
16 25138850 7182395285
17 56166614 7180930966
18 11098554 7180930966
19 56159509 7180930966
20 25138850 7185462234
21 56148974 7185462234
22 25138850 7185465013
23 56024315 7185465013
24 56115247 7185465281
25 25138850 7185465281
26 56148975 7185466029
这几样输出
一个解决方案正在行1-6和26。
one solution is taking rows 1-6 and 26.
Row OtherID Fax
1 11098554 2063504752
2 56200936 2080906666
3 11098554 7182160901
4 25138850 7182160901
5 56148974 7182232046
6 56530104 7182234134
26 56148975 7185466029
另一种解决方案是采取的行1-4和10-14
Another solution is taking rows 1-4 and 10-14.
Row OtherID Fax
1 11098554 2063504752
2 56200936 2080906666
3 11098554 7182160901
4 25138850 7182160901
10 56597717 7182248132
11 56166294 7182248132
12 25138850 7182248132
13 56148974 7182390090
14 56226456 7182390090
有许多。
我只需要传真作为输出。
I only need FAX as my output.
这是一个传真活动,我们需要确保没有传真号码传真两倍,即连接到该传真号码,所有的人都下发一份传真进行接触。
This is for a fax campaign, we need to make sure no fax number is faxed twice, that all people connected to that fax number are contacted under one fax sent.
这样的想法是把你最终使用的传真下的所有OtherIDs。
So the idea is to take all OtherIDs under ANY fax you end up using.
在这里编辑是它是如何做的目前,这也许有助于画画
EDIT here's how it's currently done, maybe this helps paint a picture
列表是通过传真来分类的,他们去了列表,随机点确保最后的记录结束与同一份传真。所以在我的例子,他们会停留在任一排1,2,4,5,6,9,12,14,16,19,21,23,25,26。然后他们知道自己有多少独特OtherIDs有,直到这一点。如果它是太多,他们去了一些,看看他们有多少。如果它太少了,他们去了一些,看看他们有多少。他们一直这样做,直到他们得到他们的唯一编号。唯一的要求是始终包含在传真的所有OtherIDs。
list is sorted by fax, they go down the list to a random point MAKING SURE THE LAST RECORD ENDS WITH THE SAME FAX. so in my example they'd stop at either row 1,2,4,5,6,9,12,14,16,19,21,23,25,26. they then see how many unique OtherIDs they have up until that point. if it's too many they go up some, see how many they have. if it's too little, they go down some, see how many they have. and they keep doing this until they get their unique number. the only requirement is to always include all OtherIDs under a fax.
推荐答案
编辑2015年2月13日 使用接受的答案几个月后,我遇到了尚未发生的一个场景,并意识到他的解决方案,只有当我需要得到一个数字,不是太接近总。例如,如果记录我的总数是15000,我要求12000那么他的code将给予10或11K。如果我问8K的话,我可能会获得8。
EDIT 2/13/2015 after using the accepted answer for a few months i came across a scenario that hasn't happened yet and realized that his solution only works if i need to get a number that's not too close to the total. for example, if my total number of records is 15000 and i'm asking for 12000 then his code will give 10 or 11k. if i ask for 8k then i will probably get the 8.
我不明白他的code不和他从来没有说,所以我无法解释为什么发生这种情况,我的猜测是,他走的是数以一定的顺序和给出的结果也依赖于订购传真进行排序 - 他不会每次都一定得到最好的结果。 当有足够的空间(问8升出15K),他有足够的空间任意组合,以产生可以接受的结果,但一旦你问一个更严格的数量(12K出15K)他锁定在他的命令,并耗尽的速度不够快接受计数
i don't understand what his code does and he never replied so i can't explain why this is happening, my guess is that he's taking the counts in a certain order and since the results are dependent on the order the faxes are sorted in - he won't necessarily get the best results every time. when there's enough room (asking 8l out of 15k) he has enough room for any combination to yield the acceptable result but once you ask for a tighter number (12k out of 15k) he's locked into his order and runs out of acceptable counts fast enough.
所以这是code,将给予正确的结果不管是什么。它几乎没有优雅,是极其缓慢的,但它的工作原理。
so this is the code that will give correct result no matter what. it's not nearly as elegant and is extremely slow but it works.
14年12月13日我想我得到了它,PL / SQL,而不是目前最好的解决方案,但它提供了比他们目前拿到手有什么更好的结果。实际上,将是非常有兴趣了解可能出现的问题
12/13/14 i think i got it, PL/SQL, not the best solution by far but it gives better results than what they currently get by hand. actually, would be really interested to hear about possible problems
14年12月13日编辑接受的答案是要做到这一点,我只有离开这个对比度,使人们可以看到如何不code笑。
12/13/14 EDIT the accepted answer is the way to do it, i'm only leaving this here for contrast, so people can see how not to code lol.
DECLARE
CountsNeededTotal NUMBER;
CountsNeededRemaining NUMBER;
CurCountsTotal NUMBER;
CurFaxCount NUMBER;
CurFaxCountPicked NUMBER;
BEGIN
CountsNeededTotal := 420;
CurCountsTotal := 0;
CurFaxCount := 0;
CountsNeededRemaining := CountsNeededTotal - CurCountsTotal;
EXECUTE IMMEDIATE 'TRUNCATE TABLE NR_PVO_121';
--############################################################################################
--############################################################################################
--############################################################################################
--############################################################################################
--############################################################################################
--START BLOCK
--this block jsut gets the first fax, the fax with the largest number of people
--############################################################################################
--############################################################################################
--############################################################################################
--############################################################################################
--############################################################################################
--get the first fax with the most people as long as thta number isn't larger than the number needed
SELECT MAX(CountOfPeople) CountOfPeople
INTO CurFaxCount
FROM (SELECT fax
,COUNT(1) CountOfPeople
FROM NR_PVO_120
GROUP BY Fax
HAVING COUNT(1) <= CountsNeededRemaining);
COMMIT;
--if there is a number that's not larger then add to the table and keep looping
--if there isn't then there's no providers from this campaign that can be used
IF CurFaxCount >= 0 THEN
--insert into the 121 table (final list of faxes)
INSERT INTO NR_PVO_121
SELECT fax
,COUNT(1) CountOfPeople
FROM NR_PVO_120
HAVING COUNT(1) = (SELECT MAX(CountOfPeople) CountOfPeople
FROM (SELECT fax
,COUNT(1) CountOfPeople
FROM NR_PVO_120
GROUP BY Fax
HAVING COUNT(1) <= CountsNeededTotal))
GROUP BY Fax;
COMMIT;
--############################################################################################
--############################################################################################
--############################################################################################
--############################################################################################
--############################################################################################
--START BLOCK
--this block loops through remaining faxes
--############################################################################################
--############################################################################################
--############################################################################################
--############################################################################################
--############################################################################################
SELECT SUM(CountOfPeople) INTO CurCountsTotal FROM NR_PVO_121;
IF CurCountsTotal < CountsNeededTotal THEN
CountsNeededRemaining := CountsNeededTotal - CurCountsTotal;
--loop until counts needed remaining is 0 or as close as 0 as possible without going in the negative
WHILE CountsNeededRemaining >= 0 LOOP
--clear 122 table
EXECUTE IMMEDIATE 'TRUNCATE TABLE NR_PVO_122';
--loop through all faxes in 120 table MINUS the ones in the 121 table
DECLARE
CURSOR CurRec IS
SELECT DISTINCT Fax
FROM NR_PVO_120
WHERE Fax NOT IN (SELECT Fax FROM NR_PVO_121);
PVO CurRec%ROWTYPE;
BEGIN
OPEN CurRec;
LOOP
FETCH CurRec INTO PVO;
SELECT DISTINCT COUNT(OtherID) CountOfPeople
INTO CurFaxCount
FROM NR_PVO_120
WHERE Fax = PVO.fax
AND OtherID NOT IN (SELECT DISTINCT OtherID
FROM NR_PVO_120
WHERE fax IN (SELECT Fax FROM NR_PVO_121));
-- DBMS_OUTPUT.put_line('CurFaxCount ' || CurFaxCount);
-- DBMS_OUTPUT.put_line('CountsNeededRemaining ' || CountsNeededRemaining);
IF CurFaxCount <= CountsNeededRemaining THEN
--record their unique counts in 122 table IF THEY'RE NOT LARGER THAN CountsNeededRemaining
INSERT INTO NR_PVO_122
SELECT PVO.fax
,CurFaxCount
FROM DUAL;
COMMIT;
END IF;
EXIT WHEN CurRec%NOTFOUND;
--end fax loop
END LOOP;
CLOSE CurRec;
END;
--pick the highest count from 122 table
SELECT MAX(CountOfPeople) CountOfPeople INTO CurFaxCountPicked FROM NR_PVO_122;
--add this fax to the 121 table
INSERT INTO NR_PVO_121
SELECT MIN(Fax) Fax
,CurFaxCountPicked
FROM NR_PVO_122
WHERE CountOfPeople = CurFaxCountPicked;
COMMIT;
--add the counts to the CurCountsTotal
CurCountsTotal := CurCountsTotal + CurFaxCountPicked;
--recalc CountsNeededRemaining
CountsNeededRemaining := CountsNeededTotal - CurCountsTotal;
--
-- DBMS_OUTPUT.put_line('CurCountsTotal ' || CurCountsTotal);
-- DBMS_OUTPUT.put_line('CurFaxCountPicked ' || CurFaxCountPicked);
-- DBMS_OUTPUT.put_line('CurFaxCount ' || CurFaxCount);
-- DBMS_OUTPUT.put_line('CountsNeededRemaining ' || CountsNeededRemaining);
-- DBMS_OUTPUT.put_line('CountsNeededTotal ' || CountsNeededTotal);
--clear 122 table
EXECUTE IMMEDIATE 'TRUNCATE TABLE NR_PVO_122';
--end while loop
END LOOP;
END IF;
--############################################################################################
--############################################################################################
--############################################################################################
--############################################################################################
--############################################################################################
--END BLOCK
--this block loops through remaining faxes
--############################################################################################
--############################################################################################
--############################################################################################
--############################################################################################
--############################################################################################
END IF;
--############################################################################################
--############################################################################################
--############################################################################################
--############################################################################################
--############################################################################################
--END BLOCK
--this block jsut gets the first fax, the fax with the largest number of people
--############################################################################################
--############################################################################################
--############################################################################################
--############################################################################################
--############################################################################################
END;
这里有一个更好的版本,远远超过上述的快,但它可能不会在某些情况下返回完美的效果。我是不是能够得到错误的结果,同时测试,但有可能是因为我并不想每一个可能的组合(如在第一个版本),这需要花费数天才能完成为20K记录的数据集
here's a better version, MUCH faster than the above but it probably won't return perfect results in some cases. i wasn't able to get wrong results while testing but there is a possibility because i'm not trying every possible combination (as in the first version), that takes days to finish for a dataset of 20K records
DECLARE
CountsNeededTotal NUMBER;
CountsNeededRemaining NUMBER;
CurCountsTotal NUMBER;
BEGIN
CurCountsTotal := 0;
SELECT NoOfProvToKeep INTO CountsNeededTotal FROM NR_PVO_121;
CountsNeededRemaining := CountsNeededTotal - CurCountsTotal;
EXECUTE IMMEDIATE 'TRUNCATE TABLE nr_pvo_122';
COMMIT;
IF CurCountsTotal <= CountsNeededTotal THEN
--loop until counts needed remaining is 0 or as close as 0 as possible without going in the negative
WHILE CountsNeededRemaining > 0 LOOP
--clear 122 table
INSERT INTO NR_PVO_122
SELECT Fax
,CountOfPeople
FROM (SELECT DISTINCT COUNT(OtherID) CountOfPeople
,Fax
FROM NR_PVO_120
WHERE OtherID NOT IN (SELECT DISTINCT OtherID
FROM NR_PVO_120
WHERE fax IN (SELECT Fax FROM NR_PVO_122))
HAVING COUNT(1) <= CountsNeededRemaining
GROUP BY fax
ORDER BY 1 DESC)
WHERE ROWNUM = 1;
SELECT SUM(CountOfPeople) INTO CurCountsTotal FROM NR_PVO_122;
COMMIT;
--recalc CountsNeededRemaining
CountsNeededRemaining := CountsNeededTotal - CurCountsTotal;
--
--DBMS_OUTPUT.put_line('CurCountsTotal ' || CurCountsTotal || ', CountsNeededRemaining ' || CountsNeededRemaining);
--end while loop
END LOOP;
END IF;
DELETE FROM NR_PVO_112
WHERE NVL(Fax, '999999999999') NOT IN (SELECT Fax FROM NR_PVO_122);
END;
这篇关于得到了一些独特的价值,而不分离值属于价值相同的块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!