使用我的SQL随机选择独占记录 [英] Use my sql to randomly select exclusive records

查看:92
本文介绍了使用我的SQL随机选择独占记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的表A如下

id (integer)
follow_up (integer, days under observation)
matched_id (integer)


id   ; follow_up  ; matched_id
1    ; 10              ; 19
1    ; 10              ; 20
1    ; 10              ; 21
2    ; 5               ; 22
2    ; 5               ; 23
2    ; 5               ; 24
2    ; 5               ; 19
2    ; 5               ; 20
3    ; 6               ; 25
3    ; 6               ; 26
3    ; 6               ; 27
4    ; 7               ; 19
4    ; 7               ; 28
4    ; 7               ; 29

我想将每个id限制为2条记录,并且应该随机选择这些记录,并且每个id都应排除这些记录.例如 match_id:将id:1赋予"19"和"20",则不应将id:2赋予"19"和"20" matched_id:为id:1分配了"19",则不应为id:4分配"19" 依此类推,直到表格的其余部分.

I would like to limit to 2 records per id, and the records should be randomly picked up and be exclusive for each id. For, example matched_id: "19" and "20" were given to id:1, then "19" and "20" should not be given to id:2 matched_id: "19" was given to id:1, then "19" should not be given to id:4 and so on for the rest of the table.

需要输出

id   ; follow_up  ; matched_id
1    ; 10              ; 19
1    ; 10              ; 20
2    ; 5               ; 22
2    ; 5               ; 23
3    ; 6               ; 25
3    ; 6               ; 26
4    ; 7               ; 28
4    ; 7               ; 29

请帮助我.非常感谢!

推荐答案

这是一个非常好而且非常具有挑战性的SQL问题.

This is a very good and very challenging SQL question.

您有一组非常具有挑战性的要求: 1.结果中,matched_id不应出现多次 2.不能给ID超过两次的匹配 3.匹配是随机的

You have a very challenging set of requirements: 1. No matched_id should appear more than once in the result set 2. No ID be given more than two matches 3. The matching be random

我们将坚持使用纯SQL解决方案,假设您无法返回更大的结果集,并且无法使用实现语言中的业务逻辑进行一些过滤.

We will stick to a pure SQL solution, assuming that you can't return, say, a larger result set and do some filtering using business logic in your implementation language.

首先,让我们解决随机分配问题.在组内随机订购商品是一个有趣的问题.我决定通过对行中数据的SHA1哈希(id,follow_up,matched_id)进行排序来解决此问题,这将带来可重复的结果,并且具有随机性. (最好是有一个包含创建或修改的日期/时间的列.)

First, let's tackle random assignment. Randomly ordering items inside of groups is a fun question. I decided to tackle it by ordering on a SHA1 hash of the data in the row (id, follow_up, matched_id), which will give a repeatable result with a feeling of randomness. (This would be best if there were a column that contained the date/time created or modified.)

SELECT * FROM
(
  SELECT
    a.id,
    a.follow_up,
    a.matched_id,
    a.rank_hash,
    count(*) rank
  FROM
  (SELECT *, SHA1(CONCAT(id, follow_up, matched_id)) rank_hash FROM TableA) a
  JOIN 
  (SELECT *, SHA1(CONCAT(id, follow_up, matched_id)) rank_hash FROM TableA) b
  ON a.rank_hash >= b.rank_hash
  AND a.id = b.id
  GROUP BY a.id, a.matched_id
  ORDER BY a.id, rank
) groups
WHERE rank <= 2
GROUP BY matched_id

如果每个id有足够的matchd_id值,则这可能满足您的用例.但是,如果有隐藏的第四项要求,该怎么办: 4.如果可能,一个ID应该会匹配.

This might suffice for your use case if there are sufficient matched_id values for each id. But what if there is a hidden fourth requirement: 4. If possible, an ID should receive a match.

换句话说,如果由于随机改组,将matchd_id分配给具有其他多个匹配项的id,但是在结果集的最下方,则是 only 匹配项ID?可能有一个最优的解决方案,其中每个ID都与matchd_id匹配,但是由于所有matchd_id在过程中用尽了,所以它从未发生过?

In other words, what if, as a result of random shuffling, a matched_id was assigned to an id that had several other matches, but further down the result set it was the only match for an id? An optimal solution in which every ID were matched with a matched_id was possible, but it never happened because all the matched_ids were used up earlier in the process?

例如:

CREATE TABLE TableA
    (`id` int, `follow_up` int, `matched_id` varchar(1))
;

INSERT INTO TableA
    (`id`, `follow_up`, `matched_id`)
VALUES
    (1, 10, 'A'),
    (1, 10, 'B'),
    (1, 10, 'C'),
    (2, 5, 'D'),
    (2, 5, 'E'),
    (2, 5, 'F'),
    (3, 5, 'C')
;

在上面的集合中,如果ID及其匹配项是随机分配的,如果ID 1被分配了matchd_id C,则ID 3根本不会获得matchd_id.

In the above set, if IDs and their matches are assigned randomly, if ID 1 gets assigned matched_id C, then ID 3 will not get a matched_id at all.

如果我们首先找出接收到的ID有多少个匹配项,然后按该顺序排序,该怎么办?

What if we first find out how many matches an ID received, and order by that first?

SELECT 
  a.*,
  frequency
FROM TableA a
JOIN 
( SELECT
    matched_id,
    count(*) frequency
  FROM
    TableA
  GROUP BY matched_id
) b
ON a.matched_id = b.matched_id
GROUP BY a.matched_id
ORDER BY b.frequency

在这里可以使用中间人编程语言来帮助限制结果集.

This is where a middleman programming language might come in handy to help limit the result set.

但是请注意,我们也失去了对随机性的要求!如您所见,纯SQL解决方案可能会变得很丑陋.确实可以结合上面概述的技术.

But note that we also lost our requirement of randomness! As you can see, a pure SQL solution might get pretty ugly. It is indeed possible combining the techniques outlined above.

希望这能激发您的想象力.

Hopefully this will get your imagination firing.

这篇关于使用我的SQL随机选择独占记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆