每组选择N个随机记录 [英] Select N random records per group
问题描述
大家好,大家好! 我需要从每个组中选择N条随机记录.
Hallo and good sunday to everybody. I need to select N random records from each group.
从Quassnoi的查询开始
Starting from the query of Quassnoi
http://explainextended.com/2009/03/01/selecting -random-rows/
选择我编写此存储过程的X条随机记录
to select X random record I wrote this store procedure
delimiter //
drop procedure if exists casualiPerGruppo //
create procedure casualiPerGruppo(in tabella varchar(50),in campo varchar(50),in numPerGruppo int)
comment 'Selezione di N record casuali per gruppo'
begin
declare elenco_campi varchar(255);
declare valore int;
declare finite int default 0;
declare query1 varchar(250);
declare query2 varchar(250);
declare query3 varchar(250);
declare query4 varchar(250);
declare cur_gruppi cursor for select gruppo from tmp_view;
declare continue handler for not found set finite = 1;
drop table if exists tmp_casuali;
set @query1 = concat('create temporary table tmp_casuali like ', tabella);
prepare stmt from @query1;
execute stmt;
deallocate prepare stmt;
set @query2 = concat('create or replace view tmp_view as select ',campo,' as gruppo from ',tabella,' group by ',campo);
prepare stmt from @query2;
execute stmt;
deallocate prepare stmt;
open cur_gruppi;
mio_loop:loop
fetch cur_gruppi into valore;
if finite = 1 then
leave mio_loop;
end if;
set @query3 = concat("select group_concat(column_name) into @elenco_campi
from information_schema.columns
where table_name = '",tabella,"' and table_schema = database()");
prepare stmt from @query3;
execute stmt;
deallocate prepare stmt;
set @query4 = concat('insert into tmp_casuali select ',
@elenco_campi,' from (
select @cnt := count(*) + 1,
@lim :=', numPerGruppo,
' from ',tabella,
' where ',campo,' = ', valore,
' ) vars
straight_join
(
select r.*,
@lim := @lim - 1
from ', tabella, ' r
where (@cnt := @cnt - 1)
and rand() < @lim / @cnt and ', campo, ' = ', valore ,
') i');
prepare stmt from @query4;
execute stmt;
deallocate prepare stmt;
end loop;
close cur_gruppi;
select * from tmp_casuali;
end //
delimiter ;
我以这种方式回想给您一个想法:
that I recall in this way to give you an idea:
create table prova (
id int not null auto_increment primary key,
id_gruppo int,
altro varchar(10)
) engine = myisam;
insert into prova (id_gruppo,altro) values
(1,'aaa'),(2,'bbb'),(3,'ccc'),(1,'ddd'),(1,'eee'),(2,'fff'),
(2,'ggg'),(2,'hhh'),(3,'iii'),(3,'jjj'),(3,'kkk'),(1,'lll'),(4,'mmm');
call casualiPerGruppo('prova','id_gruppo',2);
我的问题是Quassnoi查询尽管性能非常好,但在大型recorset上甚至需要1秒.因此,如果我在sp中多次应用它,则总时间会增加很多.
My problem is that Quassnoi query, even though is very performant, it takes even 1 second on a large recorset. So if I apply it within my sp several times, the total time increases a lot.
您能建议我一个更好的方法来解决我的问题吗? 预先感谢
Can you suggest me a better way to solve my problem? Thanks in advance
编辑.
create table `prova` (
`id` int(11) not null auto_increment,
`id_gruppo` int(11) default null,
`prog` int(11) default null,
primary key (`id`)
) engine=myisam charset=latin1;
delimiter //
drop procedure if exists inserisci //
create procedure inserisci(in quanti int)
begin
declare i int default 0;
while i < quanti do
insert into prova (id_gruppo,prog) values (
(floor(1 + (rand() * 100))),
(floor(1 + (rand() * 30)))
);
set i = i + 1;
end while;
end //
delimiter ;
call inserisci(1000000);
@Clodoaldo: 我的存储过程
@Clodoaldo: My stored procedure
call casualipergruppo('prova','id_gruppo',2);
给了我200条记录,大约需要23秒.您的存储过程一直在给我错误代码:1473即使我将varchar值增加到20000,对于select来说嵌套也太高.我不知道查询中涉及的并集是否有任何限制.
gives me 200 records and takes about 23 seconds. Your stored procedure keeps on giving me Error Code : 1473 Too high level of nesting for select even though I increase varchar value to 20000. I don't know if there is any limit on unions involved in a query.
推荐答案
我从过程中删除了tabella和campo参数,只是为了使其更易于理解.我确定你可以把它们带回来.
I removed the tabella and campo parameters from the procedure just to make it easier to understand. I'm sure you can bring them back.
delimiter //
drop procedure if exists casualiPerGruppo //
create procedure casualiPerGruppo(in numPerGruppo int)
begin
declare valore int;
declare finite int default 0;
declare query_part varchar(200);
declare query_union varchar(2000);
declare cur_gruppi cursor for select distinct id_gruppo from prova;
declare continue handler for not found set finite = 1;
create temporary table resultset (id int, id_gruppo int, altro varchar(10));
set @query_part = 'select id, id_gruppo, altro from (select id, id_gruppo, altro from prova where id_gruppo = @id_gruppo order by rand() limit @numPerGruppo) ss@id_gruppo';
set @query_part = replace(@query_part, '@numPerGruppo', numPerGruppo);
set @query_union = '';
open cur_gruppi;
mio_loop:loop
fetch cur_gruppi into valore;
if finite = 1 then
leave mio_loop;
end if;
set @query_union = concat(@query_union, concat(' union ', @query_part));
set @query_union = replace(@query_union, '@id_gruppo', valore);
end loop;
close cur_gruppi;
set @query_union = substr(@query_union, 8);
set @query_union = concat('insert into resultset ', @query_union);
prepare stmt from @query_union;
execute stmt;
deallocate prepare stmt;
select * from resultset order by id_gruppo, altro;
drop table resultset;
end //
delimiter ;
这篇关于每组选择N个随机记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!