从大表中获取随机结果 [英] Getting random results from large tables

查看:63
本文介绍了从大表中获取随机结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正试图从一张拥有约700万条记录的表中获得4个随机结果.此外,我还想从同一张表中获取4条按类别过滤的随机记录.

现在,正如您想象的那样,对表进行随机排序会导致查询花费几秒钟,这是不理想的.

我想到的另一个用于non-filtered结果集的方法是让PHP选择1-7,000,000左右的一些随机数,然后对查询执行IN(...)以仅捕获那些行-并且是的,我知道此方法有一个警告,如果具有该ID的记录不再存在,您可能得到的结果少于4.

但是,上述方法显然不适用于类别过滤,因为PHP不知道哪些记录编号属于哪个类别,因此无法选择要选择的记录编号.

有没有更好的方法可以做到这一点?我能想到的唯一方法是将每个类别的记录ID存储在另一个表中,然后从中选择随机结果,然后在辅助查询中从主表中仅选择那些记录ID.但我敢肯定有更好的方法!?

解决方案

您当然可以在使用LIMITWHERE(对于类别)的查询中使用RAND()函数.但是,正如您所指出的那样,这需要对数据库进行扫描,这需要花费时间,尤其是在您的情况下,由于数据量大.

再次指出,

将id/category_id存储在另一个表中可能会证明速度更快,但是该表上还必须包含一个LIMITWHERE,它们也将包含相同数量记录作为主表.

一种不同的方法(如果适用)将是每个类别都有一个表,并在其中存储ID.如果您的类别是固定的或不经常更改,那么您应该可以使用该方法.在这种情况下,您将有效地从子句中删除WHERE,并且在每个类别表上使用RAND()加上LIMIT将会更快,因为每个类别表都将包含主表中的记录子集. >

其他一些选择是仅将键/值对数据库用于该操作. MongoDb或Google AppEngine可以为您提供帮助,而且速度非常快.

您还可以在MySQL中采用主/从方式.从服务器实时复制内容,但是当您需要执行昂贵的查询时,可以查询从服务器而不是主服务器,从而将负载传递给另一台计算机.

最后,您可以使用Sphinx,它易于安装和维护.然后,您可以将每个类别查询都视为文档搜索,并让Sphinx将结果随机化.这样,您就可以将此昂贵的操作抵消到另一个层,并让MySQL继续进行其他操作.

只需考虑一些问题.

I'm trying to get 4 random results from a table that holds approx 7 million records. Additionally, I also want to get 4 random records from the same table that are filtered by category.

Now, as you would imagine doing random sorting on a table this large causes the queries to take a few seconds, which is not ideal.

One other method I thought of for the non-filtered result set would be to just get PHP to select some random numbers between 1 - 7,000,000 or so and then do an IN(...) with the query to only grab those rows - and yes, I know that this method has a caveat in that you may get less than 4 if a record with that id no longer exists.

However, the above method obviously will not work with the category filtering as PHP doesn't know which record numbers belong to which category and hence cannot select the record numbers to select from.

Are there any better ways I can do this? Only way I can think of would be to store the record id's for each category in another table and then select random results from that and then select only those record ID's from the main table in a secondary query; but I'm sure there is a better way!?

解决方案

You could of course use the RAND() function on a query using a LIMIT and WHERE (for the category). That however as you pointed out, entails a scan of the database which takes time, especially in your case due to the volume of data.

Your other alternative, again as you pointed out, to store id/category_id in another table might prove a bit faster but again there has to be a LIMIT and WHERE on that table which will also contain the same amount of records as the master table.

A different approach (if applicable) would be to have a table per category and store in that the IDs. If your categories are fixed or do not change that often, then you should be able to use that approach. In that case you will effectively remove the WHERE from the clause and getting a RAND() with a LIMIT on each category table would be faster since each category table will contain a subset of records from your main table.

Some other alternatives would be to use a key/value pair database just for that operation. MongoDb or Google AppEngine can help with that and are really fast.

You could also go towards the approach of a Master/Slave in your MySQL. The slave replicates content in real time but when you need to perform the expensive query you query the slave instead of the master, thus passing the load to a different machine.

Finally you could go with Sphinx which is a lot easier to install and maintain. You can then treat each of those category queries as a document search and let Sphinx randomize the results. This way you offset this expensive operation to a different layer and let MySQL continue with other operations.

Just some issues to consider.

这篇关于从大表中获取随机结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆