从H2数据库中的大表中选择随机行 [英] Selecting random rows from a big table in H2 database

查看:156
本文介绍了从H2数据库中的大表中选择随机行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据库中有一个大表(可能有数百万条记录),我需要选择#X随机行(假设#X在10到50之间),但我需要这个查询尽可能最佳。 / p>

该表如下所示:

  CREATE TABLE样本(
id bigint auto_increment PRIMARY KEY,
user_id bigint NOT NULL,
screen_name VARCHAR NOT NULL,
...
);

我一直在搜索,我找到了这样的答案:

  SELECT * FROM sample ORDER BY RAND()limit X. 

但在我看来,这将获取整个表然后对其进行排序,不是吗?



我认为最好生成10或50个随机整数并执行 select * from sample in rowid in(< random integer list>)。但afaik,在H2中缺少rowid概念,所以我可以选择在我的表中使用ID列。



如果我可以使用单个SQL查询执行此任务,那将是非常棒的。



任何更好的建议?

解决方案

以下脚本非常有效地选择每一行。它假设ID中没有间隙。如果可能存在间隙,则可能需要将范围(1,100)增加到范围(1,200)左右。要获得随机行,最后的公式需要稍微更改一下:

  drop table test; 

创建表测试(
id bigint auto_increment主键,
名称varchar(255));

插入测试
选择x,'Hello'|| x来自system_range(50,1200);

select * from test t,system_range(1,100)range
其中t.id = x *(从test中选择max(id)-min(id))/ 100 +
(从测试中选择min(id));


I have a big table in my database (potentially millions of records) and I need to select #X random rows (let's say #X between 10 and 50) , but I need this query to be as optimal as possible.

The table looks like this:

CREATE TABLE sample (
    id bigint auto_increment PRIMARY KEY,
    user_id bigint NOT NULL,
    screen_name VARCHAR NOT NULL,
    ...
);

I've searched around and I found answers like this:

SELECT * FROM sample ORDER BY RAND() limit X.

But It looks to me that this will fetch the full table then sort it, isn't it?

I think it would be best to generate 10 or 50 random integers and do a select * from sample where rowid in (<random integer list>). But afaik, rowid concept is missing in H2, so I may opt for using the ID column in my table.

It would be awesome if I can do this task with a single SQL query.

Any better advice?

解决方案

The following script selects every nth row quite efficiently. It assumes there are no gaps in the ids. If gaps are possible, then you might want to increase the range(1, 100) to range(1, 200) or so. To get random rows, the formula at the very end would need to be changed a bit:

drop table test;

create table test(
  id bigint auto_increment primary key, 
  name varchar(255));

insert into test 
select x, 'Hello ' || x from system_range(50, 1200);

select * from test t, system_range(1, 100) range
where t.id = x * (select max(id)-min(id) from test) / 100 + 
(select min(id) from test);

这篇关于从H2数据库中的大表中选择随机行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆