我需要一个函数从表中选择88个随机行(无重复项) [英] I need a function to select 88 random rows from a table (without duplicates)
问题描述
我有一个带有几列的表(id,描述,创建的(时间戳)和ipaddress)。我已插入200行作为伪数据。我需要一种方法来从该表中拉出88个无重复的随机行。
I have a table with a few columns (id, description, created (timestamp) and ipaddress). I have inserted 200 rows as dummy data. I need a way to pull 88 random rows with no duplicates from that table.
我已经尝试过:
create or replace function GetRandomCrazy88() returns setof varchar(255) as
'
select description
from task
left join tagassignment t on task.id = t.taskid
order by random()
limit 88;
' language 'sql';
但这将返回重复的行。
But this returns duplicate rows.
我也尝试过此操作(有点失控):
I also tried this (it got a bit out of hand):
CREATE OR REPLACE FUNCTION GetRandomCrazy88(amount INTEGER)
RETURNS SETOF VARCHAR(255) AS
$$
DECLARE
tasklist INTEGER[] := '{}'::INTEGER[];
randomid INTEGER;
counter INTEGER := 0;
BEGIN
WHILE counter <= amount LOOP
SELECT CASE WHEN id = 0 THEN 1 ELSE id END INTO randomid
FROM ROUND(RANDOM() * (SELECT COUNT(*) - 1 FROM task)) AS id;
IF randomid = ANY(tasklist) OR ARRAY_LENGTH(tasklist, 1) IS NULL THEN
tasklist = array_append(tasklist, randomid);
counter := counter + 1;
ELSE
RAISE NOTICE 'DUPLICATE ID!!!';
END IF;
END LOOP;
RETURN QUERY SELECT description
FROM task t
WHERE t.id = ANY(tasklist);
END;
$$ LANGUAGE plpgsql
SECURITY DEFINER;
在while循环中失败。它永远不会达到期望的88个数字,因为它无法在if语句中向数组添加任何内容,因为该数组为空且值为NULL。
It fails in the while loop. It never reaches the desired 88 numbers, since it can't add anything to the array in the if-statement, since the array is empty with a NULL-value.
有什么方法可以准确地获得88个随机行,而没有重复吗?
Is there any way I can get exactly 88 random rows, without any duplicates?
推荐答案
这是一个您可能会喜欢的快速解决方案:
Here's a quick solution that you might like:
CREATE EXTENSION IF NOT EXISTS tsm_system_rows;
select * from task
tablesample system_rows (88);
供参考,TABLESAMPLE在SELECT文档中:
https://www.postgresql.org/docs/current/sql-select.html
For reference, TABLESAMPLE is in the docs for SELECT: https://www.postgresql.org/docs/current/sql-select.html
以下是该功能的不错写法:
Here's quite a good write-up of the feature:
https://www.2ndquadrant.com/en/blog/tablesample-in-postgresql-9 -5-2 /
...以及同一作者关于随机抽样的一般主题的另一篇文章:
...and another piece on the general subject of random sampling by the same author:
https://www.2ndquadrant.com/zh-CN/blog/tablesample-and-other-methods-for-getting-random-tuples/
tsm_system_rows是两个标准采样扩展之一,在此处记录:
https://www.postgresql.org/docs/current/tsm-system-rows.html
tsm_system_rows is one of two standard sampling extensions, documented here: https://www.postgresql.org/docs/current/tsm-system-rows.html
嘿!很高兴您提出这个问题。我倾向于使用开箱即用的SELECT中内置的BERNOULLI方法,但是它基于百分比。我只是尝试了一下,它运行良好:
Hey! I'm glad you asked this question. I tend to use the BERNOULLI method, which is built into SELECT out of the box, but it's based on a percentage. I just tried this out and it works fine:
select * from task
tablesample BERNOULLI (1)
limit 88
这篇关于我需要一个函数从表中选择88个随机行(无重复项)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!