从具有加权行概率的PostgreSQL表中选择随机行 [英] Select random row from a PostgreSQL table with weighted row probabilities
问题描述
示例输入:
SELECT * FROM test;
id | percent
----+----------
1 | 50
2 | 35
3 | 15
(3 rows)
您将如何编写这样的查询,即平均50%的时间我可以获得ID = 1的行,ID = 2的时间行的35%和ID = 3的时间行的15%?
类似于 SELECT id from test ORDER BY p * random()DESC LIMIT 1
,但是它给出了错误的结果。运行10,000次后,我得到的分布如下: {1 = 6293,2 = 3302,3 = 405}
,但我希望分布几乎是: {1 = 5000,2 = 3500,3 = 1500}
。
I tried something like SELECT id FROM test ORDER BY p * random() DESC LIMIT 1
, but it gives wrong results. After 10,000 runs I get a distribution like: {1=6293, 2=3302, 3=405}
, but I expected the distribution to be nearly: {1=5000, 2=3500, 3=1500}
.
有什么想法吗?
推荐答案
这应该可以解决问题:
WITH CTE AS (
SELECT random() * (SELECT SUM(percent) FROM YOUR_TABLE) R
)
SELECT *
FROM (
SELECT id, SUM(percent) OVER (ORDER BY id) S, R
FROM YOUR_TABLE CROSS JOIN CTE
) Q
WHERE S >= R
ORDER BY id
LIMIT 1;
子查询 Q
给出以下内容结果:
The sub-query Q
gives the following result:
1 50
2 85
3 100
然后我们只需生成一个范围为[0,100)的随机数,然后选择等于或大于该数的第一行( WHERE
子句)。我们使用公用表表达式( WITH
)确保随机数仅计算一次。
We then simply generate a random number in range [0, 100) and pick the first row that is at or beyond that number (the WHERE
clause). We use common table expression (WITH
) to ensure the random number is calculated only once.
BTW , percent
中的任何权重进行操作-他们严格不需要表示百分比(即总计100)。
BTW, the SELECT SUM(percent) FROM YOUR_TABLE
allows you to have any weights in percent
- they don't strictly need to be percentages (i.e. add-up to 100).
这篇关于从具有加权行概率的PostgreSQL表中选择随机行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!