最快的随机选择WHERE列X为Y(NULL) [英] Fastest random selection WHERE column X is Y (NULL)

查看:68
本文介绍了最快的随机选择WHERE列X为Y(NULL)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当前我正在使用:

 SELECT * 
FROM 
  table AS t1
  JOIN (
    SELECT (RAND() * (SELECT MAX(id) FROM table where column_x is null)) AS id
  ) AS t2 
WHERE 
  t1.id >= t2.id
  and column_x is null
ORDER BY t1.id ASC
LIMIT 1
 

这通常非常快,但是当我将突出显示的column_x包含为Y(空)条件时,它会变慢.

记录的第X列为空的最快的随机查询解决方案是什么?

ID为PK,列X为int(4). Table包含大约一百万条记录,并且当前每24小时将其自身大小翻倍,超过1 GB.

column_x已建立索引.

列ID可能不是连续的.

在这种情况下使用的数据库引擎是InnoDB.

谢谢.

解决方案

获取真正随机的记录可能很慢.绕开这个事实并没有太多的事情.如果您希望它是真正随机的,那么查询必须加载所有相关数据,以便知道它必须选择哪些记录.

但是,幸运的是,有更快的方法可以做到这一点.它们不是适当的随机性,但是如果您乐意为速度交换一些纯随机性,那么它们对于大多数用途应该足够好.

请记住,获取随机"记录的最快方法是在数据库中添加一个额外的列,该列中填充了一个随机值.也许是主键的盐化MD5哈希?任何.在此列上添加适当的索引,然后只需将该列添加到查询中的ORDER BY子句中,您就会以随机顺序将记录取回.

要获取单个随机记录,只需指定LIMIT 1并添加WHERE random_field > $random_value,其中随机值将是新字段范围内的值(例如,假设是一个随机数的MD5哈希). /p>

当然,这里的缺点是,尽管您的记录将按随机顺序排列,但它们将按相同的随机顺序卡住.我确实说过这是为了查询速度而进行的完美交易.您可以通过使用新值定期更新它们来解决此问题,但是我想如果您需要保持最新状态,可能对您来说是个问题.

另一个缺点是,添加额外的列可能会太多,无法询问您是否有存储限制并且您的数据库已经很大,或者在添加列之前是否有严格的DBA需要克服.但是,再次,您必须权衡一些东西.如果需要查询速度,则需要此额外的列.

无论如何,我希望能有所帮助.

Currently I am using:

SELECT * 
FROM 
  table AS t1
  JOIN (
    SELECT (RAND() * (SELECT MAX(id) FROM table where column_x is null)) AS id
  ) AS t2 
WHERE 
  t1.id >= t2.id
  and column_x is null
ORDER BY t1.id ASC
LIMIT 1

This is normally extremely fast however when I include the highlighted column_x being Y (null) condition, it gets slow.

What would be the fastest random querying solution where the records' column X is null?

ID is PK, column X is int(4). Table contains about a million records and over 1 GB in total size doubling itself every 24 hours currently.

column_x is indexed.

Column ID may not be consecutive.

The DB engine used in this case is InnoDB.

Thank you.

解决方案

Getting a genuinely random record can be slow. There's not really much getting around this fact; if you want it to be truly random, then the query has to load all the relevant data in order to know which records it has to choose from.

Fortunately however, there are quicker ways of doing it. They're not properly random, but if you're happy to trade a bit of pure randomness for speed, then they should be good enough for most purposes.

With that in mind, the fastest way to get a "random" record is to add an extra column to your DB, which is populated with a random value. Perhaps a salted MD5 hash of the primary key? Whatever. Add appropriate indexes on this column, and then simply add the column to your ORDER BY clause in the query, and you'll get your records back in a random order.

To get a single random record, simply specify LIMIT 1 and add a WHERE random_field > $random_value where random value would be a value in the range of your new field (say an MD5 hash of a random number, for example).

Of course the down side here is that although your records will be in a random order, they'll be stuck in the same random order. I did say it was trading perfection for query speed. You can get around this by updating them periodically with fresh values, but I guess that could be a problem for you if you need to keep it fresh.

The other down-side is that adding an extra column might be too much to ask if you have storage constraints and your DB is already massive in size, or if you have a strict DBA to get past before you can add columns. But again, you have to trade off something; if you want the query speed, you need this extra column.

Anyway, I hope that helped.

这篇关于最快的随机选择WHERE列X为Y(NULL)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆