从具有加权行概率的 PostgreSQL 表中选择随机行 [英] Select random row from a PostgreSQL table with weighted row probabilities

查看:29
本文介绍了从具有加权行概率的 PostgreSQL 表中选择随机行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

示例输入:

<前>选择 * 从测试;身份证 |百分----+------------1 |502 |353 |15(3 行)

你会如何编写这样的查询,平均有 50% 的时间我可以获得 id=1 的行,35% 的时间行的 id=2,以及 15% 的时间行的 id=3?

我尝试了类似 SELECT id FROM test ORDER BY p * random() DESC LIMIT 1 的方法,但它给出了错误的结果.运行 10,000 次后,我得到如下分布:{1=6293, 2=3302, 3=405},但我预计分布接近:{1=5000, 2=3500, 3=1500}.

有什么想法吗?

解决方案

这应该可以解决问题:

WITH CTE AS (SELECT random() * (SELECT SUM(percent) FROM YOUR_TABLE) R)选择 *从 (SELECT id, SUM(percent) OVER (ORDER BY id) S, R从 YOUR_TABLE 交叉加入 CTE) 问其中 S > = R按 ID 排序限制 1;

子查询 Q 给出以下结果:

1 502 853 100

然后我们简单地生成一个范围为 [0, 100) 的随机数,并选择等于或超过该数字的第一行(WHERE 子句).我们使用公用表表达式(WITH)来保证随机数只计算一次.

顺便说一句,SELECT SUM(percent) FROM YOUR_TABLE 允许您在 percent 中有任何权重 - 它们并不严格需要是百分比(即加起来是 100).

[SQL 小提琴]

Example input:

SELECT * FROM test;
 id | percent   
----+----------
  1 | 50 
  2 | 35   
  3 | 15   
(3 rows)

How would you write such query, that on average 50% of time i could get the row with id=1, 35% of time row with id=2, and 15% of time row with id=3?

I tried something like SELECT id FROM test ORDER BY p * random() DESC LIMIT 1, but it gives wrong results. After 10,000 runs I get a distribution like: {1=6293, 2=3302, 3=405}, but I expected the distribution to be nearly: {1=5000, 2=3500, 3=1500}.

Any ideas?

解决方案

This should do the trick:

WITH CTE AS (
    SELECT random() * (SELECT SUM(percent) FROM YOUR_TABLE) R
)
SELECT *
FROM (
    SELECT id, SUM(percent) OVER (ORDER BY id) S, R
    FROM YOUR_TABLE CROSS JOIN CTE
) Q
WHERE S >= R
ORDER BY id
LIMIT 1;

The sub-query Q gives the following result:

1  50
2  85
3  100

We then simply generate a random number in range [0, 100) and pick the first row that is at or beyond that number (the WHERE clause). We use common table expression (WITH) to ensure the random number is calculated only once.

BTW, the SELECT SUM(percent) FROM YOUR_TABLE allows you to have any weights in percent - they don't strictly need to be percentages (i.e. add-up to 100).

[SQL Fiddle]

这篇关于从具有加权行概率的 PostgreSQL 表中选择随机行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆