T-SQL中的随机加权选择 [英] Random Weighted Choice in T-SQL

查看:131
本文介绍了T-SQL中的随机加权选择的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何根据所有候选行的权重在T-SQL中随机选择表行?

How do you randomly select a table row in T-SQL based on an applied weight for all candidate rows?

例如,我在表中有一组行,权重分别为50、25和25(总计为100,但不需要),并且我想随机选择其中一个行,并获得等同的统计结果到各自的重量.

For example, I have a set of rows in a table weighted at 50, 25, and 25 (which adds up to 100 but does not need to), and I want to select one of them randomly with a statistical outcome equivalent to the respective weight.

推荐答案

Dane的答案包括以引入平方律的方式进行的自连接.联接后的(n*n/2)行,表中有n行.

Dane's answer includes a self joins in a way that introduces a square law. (n*n/2) rows after the join where there are n rows in the table.

更理想的是只解析一次表.

What would be more ideal is to be able to just parse the table once.

DECLARE @id int, @weight_sum int, @weight_point int
DECLARE @table TABLE (id int, weight int)

INSERT INTO @table(id, weight) VALUES(1, 50)
INSERT INTO @table(id, weight) VALUES(2, 25)
INSERT INTO @table(id, weight) VALUES(3, 25)

SELECT @weight_sum = SUM(weight)
FROM @table

SELECT @weight_point = FLOOR(((@weight_sum - 1) * RAND() + 1), 0)

SELECT
    @id = CASE WHEN @weight_point < 0 THEN @id ELSE [table].id END,
    @weight_point = @weight_point - [table].weight
FROM
    @table [table]
ORDER BY
    [table].Weight DESC

这将遍历表格,将@id设置为每条记录的id值,同时递减@weight点.最终,@weight_point将变为负数.这意味着所有先前权重的SUM大于随机选择的目标值.这是我们想要的记录,因此从那时起,我们将@id设置为其自身(忽略表中的任何ID).

This will go through the table, setting @id to each record's id value while at the same time decrementing @weight point. Eventually, the @weight_point will go negative. This means that the SUM of all preceding weights is greater than the randomly chosen target value. This is the record we want, so from that point onwards we set @id to itself (ignoring any IDs in the table).

这仅在表中运行一次,但是即使选择的值是第一条记录,也必须在整个表中运行.因为平均位置在表格的一半位置(如果按权重递增排序则更少),编写循环可能会更快...(特别是如果权重在同一组中):

This runs through the table just once, but does have to run through the entire table even if the chosen value is the first record. Because the average position is half way through the table (and less if ordered by ascending weight) writing a loop could possibly be faster... (Especially if the weightings are in common groups):

DECLARE @id int, @weight_sum int, @weight_point int, @next_weight int, @row_count int
DECLARE @table TABLE (id int, weight int)

INSERT INTO @table(id, weight) VALUES(1, 50)
INSERT INTO @table(id, weight) VALUES(2, 25)
INSERT INTO @table(id, weight) VALUES(3, 25)

SELECT @weight_sum = SUM(weight)
FROM @table

SELECT @weight_point = ROUND(((@weight_sum - 1) * RAND() + 1), 0)

SELECT @next_weight = MAX(weight) FROM @table
SELECT @row_count   = COUNT(*)    FROM @table
SET @weight_point = @weight_point - (@next_weight * @row_count)

WHILE (@weight_point > 0)
BEGIN
    SELECT @next_weight = MAX(weight) FROM @table WHERE weight < @next_weight
    SELECT @row_count   = COUNT(*)    FROM @table WHERE weight = @next_weight
    SET @weight_point = @weight_point - (@next_weight * @row_count)
END

-- # Once the @weight_point is less than 0, we know that the randomly chosen record
-- # is in the group of records WHERE [table].weight = @next_weight

SELECT @row_count = FLOOR(((@row_count - 1) * RAND() + 1), 0)

SELECT
    @id = CASE WHEN @row_count < 0 THEN @id ELSE [table].id END,
    @row_count = @row_count - 1
FROM
    @table [table]
WHERE
    [table].weight = @next_weight
ORDER BY
    [table].Weight DESC

这篇关于T-SQL中的随机加权选择的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆