从具有加权行概率的PostgreSQL表中选择随机行 [英] Select random row from a PostgreSQL table with weighted row probabilities

查看:105
本文介绍了从具有加权行概率的PostgreSQL表中选择随机行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

示例输入:


SELECT * FROM test;
 id | percent   
----+----------
  1 | 50 
  2 | 35   
  3 | 15   
(3 rows)

您将如何编写这样的查询,即平均50%的时间我可以获得ID = 1的行,ID = 2的时间行的35%和ID = 3的时间行的15%?

类似于 SELECT id from test ORDER BY p * random()DESC LIMIT 1 ,但是它给出了错误的结果。运行10,000次后,我得到的分布如下: {1 = 6293,2 = 3302,3 = 405} ,但我希望分布几乎是: {1 = 5000,2 = 3500,3 = 1500}

I tried something like SELECT id FROM test ORDER BY p * random() DESC LIMIT 1, but it gives wrong results. After 10,000 runs I get a distribution like: {1=6293, 2=3302, 3=405}, but I expected the distribution to be nearly: {1=5000, 2=3500, 3=1500}.

有什么想法吗?

推荐答案

这应该可以解决问题:

WITH CTE AS (
    SELECT random() * (SELECT SUM(percent) FROM YOUR_TABLE) R
)
SELECT *
FROM (
    SELECT id, SUM(percent) OVER (ORDER BY id) S, R
    FROM YOUR_TABLE CROSS JOIN CTE
) Q
WHERE S >= R
ORDER BY id
LIMIT 1;

子查询 Q 给出以下内容结果:

The sub-query Q gives the following result:

1  50
2  85
3  100

然后我们只需生成一个范围为[0,100)的随机数,然后选择等于或大于该数的第一行( WHERE 子句)。我们使用公用表表达式( WITH )确保随机数仅计算一次。

We then simply generate a random number in range [0, 100) and pick the first row that is at or beyond that number (the WHERE clause). We use common table expression (WITH) to ensure the random number is calculated only once.

BTW , SELECT SUM(percent)允许您以 percent 中的任何权重进行操作-他们严格不需要表示百分比(即总计100)。

BTW, the SELECT SUM(percent) FROM YOUR_TABLE allows you to have any weights in percent - they don't strictly need to be percentages (i.e. add-up to 100).

[SQL小提琴]

这篇关于从具有加权行概率的PostgreSQL表中选择随机行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆