如何修剪重复的关联,产生了独特的最完整的一套 [英] How to prune duplicate associations to yield a unique most-complete set
问题描述
我不知道如何来说明这个问题,更不用说寻找答案。但这里是我最好的拍摄。假设我有一个表
I hardly know how to state this question, let alone search for answers. But here's my best shot. Assume I have a table
Col1 Col2
-----+-----
A | 1
A | 2
A | 3
A | 4
B | 1
B | 2
B | 3
C | 1
C | 2
C | 3
D | 1
我想找到协会(行)的子集,其中:
I want to find the subset of associations (rows) where:
- 有没有重复的COL1
- 有没有重复的col2的
- 在COL1每个值与col2的一个值相关联
所以上面的例子中可能会产生这样的结果。
So the above example could yield this result
Col1 Col2
-----+-----
A | 4
B | 2
C | 3
D | 1
注意,A-4必须是结果,因为有4个不同的字母和独特的4号,所以如果你不关联A至4,没有子集剩余未映射在col1中的每个值,同时保留该col2的唯一性。
Notice that A-4 must be in the result because there are 4 unique letters and unique 4 numbers, so if you don't associate A to 4, there's no subset remaining that doesn't map every value in Col1 while retaining the uniqueness of Col2.
另外,请注意,这将是同样有效的替换B-2和C-3与B-3和C-2。我不关心选择哪个子集,但我想要一个满足所有要求。
Also, notice that it would be equally valid to replace B-2 and C-3 with B-3 and C-2. I don't care which subset is selected, but I want one that fulfills all the requirements.
不是每个数据集都会有一个子集,满足所有要求,但我希望得到尽可能接近。
Not every set of data will have a sub-set that fulfills all the requirements, but I want to get as close as possible.
我试图用一个SQL查询做到这一点。我似乎做到这一点的一组数据的查询,但后来我不得不把它改写了一套略有不同(其中col2的其实是一对列),无法生育我先前的成功。我的第一个解决方案中使用MIN()和GROUP BY和一对夫妇加盟的汇总结果,以纪念为重复消除一个循环,直到有没有留下来安全地消除。我最近的解决方案取代本集团通过与ROW_NUMBER()使用PARTITION_BY EX pressions查询。但我无法弄清楚如何处理那里有来自像B和C乘交联对在上面的例子多有效的结果集的情况下。我早期的查询可能已经处理了,但我不能完全融为一体prehend我做了什么(必须有一个很好的一天,当我写的一个)。也许我需要做一个JOIN的ROW_NUMBER EX pressions在我的子查询?我的大脑给出了今天。我希望有人能帮助我找到一个巧妙简单的解决方案。
I'm trying to do this with a SQL query. I had a query that seemed to accomplish this for one set of data, but then I had to rewrite it for a slightly different set (where Col2 is actually a pair of columns) and could not reproduce my earlier success. My first solution used Min() and Group By and a couple Joins on aggregated results to mark duplicates for elimination in a loop until there was nothing left to safely eliminate. My more recent solution replaces the Group By queries with ROW_NUMBER() expressions that use PARTITION_BY. But I can't figure out how to handle the cases where there are multiple valid result sets from multiply-cross-linked pairs like B and C in the above example. My earlier query might have handled it, but I can't quite comprehend what I did (must have had a good day when I wrote that one). Perhaps I need to do a JOIN on the ROW_NUMBER expressions in my sub-queries? My brain gave out for today. I hope someone can help me find an ingeniously simple solution.
推荐答案
在我看来,你瞄准的东西,SQL是不够强的。这是一个非标准算法的任务,我想你需要一个真正的编程语言来实现它。你的任务提醒象棋谜语我。
It seems to me that you're aiming for something that SQL is not strong enough for. This is a non-standard algorithmic task, and I think you need a real programming language to achieve it. Your task reminds me of chess riddles.
这篇关于如何修剪重复的关联,产生了独特的最完整的一套的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!