MySQL的ORDER BY RAND()如何工作? [英] How does MySQL's ORDER BY RAND() work?

查看:148
本文介绍了MySQL的ORDER BY RAND()如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在研究和测试如何在MySQL中进行快速随机选择.在此过程中,我遇到了一些意外的结果,现在我不确定我是否知道ORDER BY RAND()的工作原理.

I've been doing some research and testing on how to do fast random selection in MySQL. In the process I've faced some unexpected results and now I am not fully sure I know how ORDER BY RAND() really works.

我一直认为,当您在表上执行ORDER BY RAND()时,MySQL会在表中添加一个新列,该列中填充有随机值,然后按该列对数据进行排序,例如您将上述值随机带到那里.我经过大量的Google搜索和测试,最后发现查询 Jay他的博客中的报价确实是最快的解决方案:

I always thought that when you do ORDER BY RAND() on the table, MySQL adds a new column to the table which is filled with random values, then it sorts data by that column and then e.g. you take the above value which got there randomly. I've done lots of googling and testing and finally found that the query Jay offers in his blog is indeed the fastest solution:

SELECT * FROM Table T JOIN (SELECT CEIL(MAX(ID)*RAND()) AS ID FROM Table) AS x ON T.ID >= x.ID LIMIT 1;

虽然普通ORDER BY RAND()在我的测试表上花费30-40秒,但他的查询仅需0.1秒即可完成工作.他在博客中解释了此功能的作用,因此我将跳过此操作,最后转到奇怪的事情.

While common ORDER BY RAND() takes 30-40 seconds on my test table, his query does the work in 0.1 seconds. He explains how this functions in the blog so I'll just skip this and finally move to the odd thing.

我的表是带有主键id和其他非索引内容(例如usernameage等)的公用表.这是我在努力解释的事情

My table is a common table with a PRIMARY KEY id and other non-indexed stuff like username, age, etc. Here's the thing I am struggling to explain

SELECT * FROM table ORDER BY RAND() LIMIT 1; /*30-40 seconds*/
SELECT id FROM table ORDER BY RAND() LIMIT 1; /*0.25 seconds*/
SELECT id, username FROM table ORDER BY RAND() LIMIT 1; /*90 seconds*/

我一直希望对所有三个查询都看到大致相同的时间,因为我总是在单个列上进行排序.但是由于某种原因,这种情况并未发生.如果您对此有任何想法,请告诉我.我有一个需要快速执行ORDER BY RAND()的项目,我个人更喜欢使用

I was sort of expecting to see approximately the same time for all three queries since I am always sorting on a single column. But for some reason this didn't happen. Please let me know if you any ideas about this. I have a project where I need to do fast ORDER BY RAND() and personally I would prefer to use

SELECT id FROM table ORDER BY RAND() LIMIT 1;
SELECT * FROM table WHERE id=ID_FROM_PREVIOUS_QUERY LIMIT 1;

是的,它比Jay的方法慢,但是更小,更容易理解.我的查询很大,有几个JOIN和WHERE子句,而Jay的方法仍然有效,但查询却变得又大又复杂,因为我需要在JOINed(在他的查询中称为x)子请求中使用所有的JOIN和WHERE.

which, yes, is slower than Jay's method, however it is smaller and easier to understand. My queries are rather big ones with several JOINs and with WHERE clause and while Jay's method still works, the query grows really big and complex because I need to use all the JOINs and WHERE in the JOINed (called x in his query) sub request.

感谢您的时间!

推荐答案

虽然没有诸如通过rand()快速订购"之类的东西,但是针对您的特定任务有一种解决方法.

While there's no such thing as a "fast order by rand()", there is a workaround for your specific task.

要获取任何随机行,您可以像德国博主一样:

For getting any single random row, you can do like this german blogger does: http://www.roberthartung.de/mysql-order-by-rand-a-case-study-of-alternatives/ (I couldn't see a hotlink url. If anyone sees one, feel free to edit the link.)

文本为德语,但SQL代码在页面下方并且在大白框中,因此不难看出.

The text is in german, but the SQL code is a bit down the page and in big white boxes, so it's not hard to see.

基本上,他所做的是创建一个程序来完成获取有效行的工作.这将生成一个介于0和max_id之间的随机数,尝试获取一行,如果该行不存在,请继续进行操作,直到找到一个行为止.他允许通过将它们存储在临时表中来提取x个随机行,因此您可能可以重写该过程,以使只提取一行更快一些.

Basically what he does is make a procedure that does the job of getting a valid row. That generates a random number between 0 and max_id, try fetching a row, and if it doesn't exist, keep going until you hit one that does. He allows for fetching x number of random rows by storing them in a temp table, so you can probably rewrite the procedure to be a bit faster fetching only one row.

这样做的缺点是,如果删除很多行,并且存在巨大的空白,那么很有可能错过很多次,使它失效.

The downside of this is that if you delete A LOT of rows, and there are huge gaps, the chances are big that it will miss tons of times, making it ineffective.

更新:不同的执行时间

SELECT * FROM表ORDER BY RAND()LIMIT 1; / 30-40秒/

从表ORDER BY RAND()的SELECT ID限制1; /0.25秒/

SELECT id FROM table ORDER BY RAND() LIMIT 1; /0.25 seconds/

SELECT ID,用户名,来自表ORDER BY RAND()LIMIT 1; / 90秒/

SELECT id, username FROM table ORDER BY RAND() LIMIT 1; /90 seconds/

我一直希望对所有三个查询都看到大致相同的时间,因为我总是在单个列上进行排序.但是由于某种原因,这种情况并未发生.如果您对此有任何想法,请告诉我.

I was sort of expecting to see approximately the same time for all three queries since I am always sorting on a single column. But for some reason this didn't happen. Please let me know if you any ideas about this.

这可能与索引编制有关. id已建立索引并可以快速访问,而将username添加到结果中,则意味着它需要从每一行中读取该值并将其放入内存表中.使用*,它还必须将所有内容读取到内存中,但是它不需要在数据文件中跳转,这意味着不会浪费时间进行查找.

It may have to do with indexing. id is indexed and quick to access, whereas adding username to the result, means it needs to read that from each row and put it in the memory table. With the * it also has to read everything into memory, but it doesn't need to jump around the data file, meaning there's no time lost seeking.

仅当存在可变长度的列(varchar/text)时,这才有所不同,这意味着它必须先检查长度,然后跳过该长度,而不是仅跳过每行之间的设置长度(或0).

This makes a difference only if there are variable length columns (varchar/text), which means it has to check the length, then skip that length, as opposed to just skipping a set length (or 0) between each row.

这篇关于MySQL的ORDER BY RAND()如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆