如何修剪重复的关联,产生了独特的最完整的一套 [英] How to prune duplicate associations to yield a unique most-complete set

查看:61
本文介绍了如何修剪重复的关联,产生了独特的最完整的一套的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不知道如何来说明这个问题,更不用说寻找答案。但这里是我最好的拍摄。假设我有一个表

I hardly know how to state this question, let alone search for answers. But here's my best shot. Assume I have a table

Col1   Col2
-----+-----
 A   | 1
 A   | 2
 A   | 3
 A   | 4
 B   | 1
 B   | 2
 B   | 3
 C   | 1
 C   | 2
 C   | 3
 D   | 1

我想找到协会(行)的子集,其中:

I want to find the subset of associations (rows) where:

  1. 有没有重复的COL1
  2. 有没有重复的col2的
  3. 在COL1每个值与col2的一个值相关联

所以上面的例子中可能会产生这样的结果。

So the above example could yield this result

Col1   Col2
-----+-----
 A   | 4
 B   | 2
 C   | 3
 D   | 1

注意,A-4必须是结果,因为有4个不同的字母和独特的4号,所以如果你不关联A至4,没有子集剩余未映射在col1中的每个值,同时保留该col2的唯一性。

Notice that A-4 must be in the result because there are 4 unique letters and unique 4 numbers, so if you don't associate A to 4, there's no subset remaining that doesn't map every value in Col1 while retaining the uniqueness of Col2.

另外,请注意,这将是同样有效的替换B-2和C-3与B-3和C-2。我不关心选择哪个子集,但我想要一个满足所有要求。

Also, notice that it would be equally valid to replace B-2 and C-3 with B-3 and C-2. I don't care which subset is selected, but I want one that fulfills all the requirements.

不是每个数据集都会有一个子集,满足所有要求,但我希望得到尽可能接近。

Not every set of data will have a sub-set that fulfills all the requirements, but I want to get as close as possible.

我试图用一个SQL查询做到这一点。我似乎做到这一点的一组数据的查询,但后来我不得不把它改写了一套略有不同(其中col2的其实是一对列),无法生育我先前的成功。我的第一个解决方案中使用MIN()和GROUP BY和一对夫妇加盟的汇总结果,以纪念为重复消除一个循环,直到有没有留下来安全地消除。我最近的解决方案取代本集团通过与ROW_NUMBER()使用PARTITION_BY EX pressions查询。但我无法弄清楚如何处理那里有来自像B和C乘交联对在上面的例子多有效的结果集的情况下。我早期的查询可能已经处理了,但我不能完全融为一体prehend我做了什么(必须有一个很好的一天,当我写的一个)。也许我需要做一个JOIN的ROW_NUMBER EX pressions在我的子查询?我的大脑给出了今天。我希望有人能帮助我找到一个巧妙简单的解决方案。

I'm trying to do this with a SQL query. I had a query that seemed to accomplish this for one set of data, but then I had to rewrite it for a slightly different set (where Col2 is actually a pair of columns) and could not reproduce my earlier success. My first solution used Min() and Group By and a couple Joins on aggregated results to mark duplicates for elimination in a loop until there was nothing left to safely eliminate. My more recent solution replaces the Group By queries with ROW_NUMBER() expressions that use PARTITION_BY. But I can't figure out how to handle the cases where there are multiple valid result sets from multiply-cross-linked pairs like B and C in the above example. My earlier query might have handled it, but I can't quite comprehend what I did (must have had a good day when I wrote that one). Perhaps I need to do a JOIN on the ROW_NUMBER expressions in my sub-queries? My brain gave out for today. I hope someone can help me find an ingeniously simple solution.

推荐答案

在我看来,你瞄准的东西,SQL是不够强的。这是一个非标准算法的任务,我想你需要一个真正的编程语言来实现它。你的任务提醒象棋谜语我。

It seems to me that you're aiming for something that SQL is not strong enough for. This is a non-standard algorithmic task, and I think you need a real programming language to achieve it. Your task reminds me of chess riddles.

这篇关于如何修剪重复的关联,产生了独特的最完整的一套的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆