通过rand()优化mysql顺序 [英] optimizing mysql order by rand()
问题描述
我找不到解决我问题的好方法.
I can't find a good answer to my problem.
我有一个mysql
查询,其中包含inner join
和order by rand()
和limit X
.当我删除order by rand()
时,查询速度提高了10倍.有没有更有效的方法来获得500行的随机子集?这是一个示例查询.
I have a mysql
query with an inner join
and an order by rand()
and a limit X
. When I remove the order by rand()
the query is 10 times faster. Is there a more efficient way to get a random subset of 500 rows? Heres a sample query.
Select * from table1
inner join table2 on table1.in = table2.in
where table1.T = A
order by rand()
limit 500;
推荐答案
这应该有所帮助:
Select *
from table1 inner join
table2
on table1.in = table2.in
where table1.T = A and rand() < 1000.0/20000.0
order by rand()
limit 500
在提取500个随机样本之前,这会将结果集限制为大约1000个随机行.要获得比预期多的行只是为了确保获得足够大的样本量.
This will limit the result set to about 1000 random rows before extracting a random sample of 500. The purpose of getting more rows than expected is just to be sure that you get a large enough sample size.
这是一种替代策略,以创建自己的索引"方法为基础.
Here is an alternative strategy, building off the "create your own indexes" approach.
使用以下查询创建临时表:
Create a temporary table using the following query:
create temporary table results as
(Select *, @rn := @rn + 1 as rn
from table1 inner join
table2
on table1.in = table2.in cross join
(select @rn := 0) const
where table1.T = A
);
您现在有了一个行号列.并且,您可以使用以下命令返回行数:
You now have a row number column. And, you can return the number of rows with:
select @rn;
然后,您可以在应用程序中生成ID.
Then you can generate the ids in your application.
我倾向于使用以下两个查询将处理保留在数据库中:
I would be inclined to keep the processing in the database, using these two queries:
create temporary table results as
(Select *, @rn := @rn + 1 as rn, rand() as therand
from table1 inner join
table2
on table1.in = table2.in cross join
(select @rn := 0) const
where table1.T = A
);
select *
from results
where therand < 1000/@rn
order by therand
limit 500;
这篇关于通过rand()优化mysql顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!