我需要一个函数从表中选择88个随机行(无重复项) [英] I need a function to select 88 random rows from a table (without duplicates)

查看:137
本文介绍了我需要一个函数从表中选择88个随机行(无重复项)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有几列的表(id,描述,创建的(时间戳)和ipaddress)。我已插入200行作为伪数据。我需要一种方法来从该表中拉出88个无重复的随机行。

I have a table with a few columns (id, description, created (timestamp) and ipaddress). I have inserted 200 rows as dummy data. I need a way to pull 88 random rows with no duplicates from that table.

我已经尝试过:

create or replace function GetRandomCrazy88() returns setof varchar(255) as
'
    select description
    from task
             left join tagassignment t on task.id = t.taskid
    order by random()
    limit 88;
' language 'sql';

但这将返回重复的行。

But this returns duplicate rows.

我也尝试过此操作(有点失控):

I also tried this (it got a bit out of hand):

CREATE OR REPLACE FUNCTION GetRandomCrazy88(amount INTEGER)
    RETURNS SETOF VARCHAR(255) AS
$$
DECLARE

    tasklist INTEGER[] := '{}'::INTEGER[];

    randomid INTEGER;
    counter INTEGER := 0;

BEGIN
    WHILE counter <= amount LOOP

        SELECT CASE WHEN id = 0 THEN 1 ELSE id END INTO randomid
        FROM ROUND(RANDOM() * (SELECT COUNT(*) - 1 FROM task)) AS id;

        IF randomid = ANY(tasklist) OR ARRAY_LENGTH(tasklist, 1) IS NULL THEN
            tasklist = array_append(tasklist, randomid);
            counter := counter + 1;
        ELSE
            RAISE NOTICE 'DUPLICATE ID!!!';
        END IF;
    END LOOP;

    RETURN QUERY SELECT description
    FROM task t
    WHERE t.id = ANY(tasklist);

END;
$$ LANGUAGE plpgsql
    SECURITY DEFINER;

在while循环中失败。它永远不会达到期望的88个数字,因为它无法在if语句中向数组添加任何内容,因为该数组为空且值为NULL。

It fails in the while loop. It never reaches the desired 88 numbers, since it can't add anything to the array in the if-statement, since the array is empty with a NULL-value.

有什么方法可以准确地获得88个随机行,而没有重复吗?

Is there any way I can get exactly 88 random rows, without any duplicates?

推荐答案

这是一个您可能会喜欢的快速解决方案:

Here's a quick solution that you might like:

CREATE EXTENSION IF NOT EXISTS tsm_system_rows;

     select * from task 
tablesample system_rows (88);

供参考,TABLESAMPLE在SELECT文档中:
https://www.postgresql.org/docs/current/sql-select.html

For reference, TABLESAMPLE is in the docs for SELECT: https://www.postgresql.org/docs/current/sql-select.html

以下是该功能的不错写法:

Here's quite a good write-up of the feature:

https://www.2ndquadrant.com/en/blog/tablesample-in-postgresql-9 -5-2 /

...以及同一作者关于随机抽样的一般主题的另一篇文章:

...and another piece on the general subject of random sampling by the same author:

https://www.2ndquadrant.com/zh-CN/blog/tablesample-and-other-methods-for-getting-random-tuples/

tsm_system_rows是两个标准采样扩展之一,在此处记录:
https://www.postgresql.org/docs/current/tsm-system-rows.html

tsm_system_rows is one of two standard sampling extensions, documented here: https://www.postgresql.org/docs/current/tsm-system-rows.html

嘿!很高兴您提出这个问题。我倾向于使用开箱即用的SELECT中内置的BERNOULLI方法,但是它基于百分比。我只是尝试了一下,它运行良好:

Hey! I'm glad you asked this question. I tend to use the BERNOULLI method, which is built into SELECT out of the box, but it's based on a percentage. I just tried this out and it works fine:

select * from task 
tablesample BERNOULLI (1)
limit 88

这篇关于我需要一个函数从表中选择88个随机行(无重复项)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆