优化SQL连接随机词 [英] Optimising SQL to concatenate random words

查看:32
本文介绍了优化SQL连接随机词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含 6,318 个随机单词的表(j_un2"),那里的总数永远不会改变.表中的 ID 是无间隙的.

I have a table ("j_un2") with 6,318 random words in, the total count there will never change. The IDs in the table are gapless.

我需要生成一个由 2 个单词组成的 5 个随机串联字符串的列表,其中字符串的总长度为 8 个字符.

I need to generate a list of 5 random concatenated strings made up of 2 words, where the total length of the string is 8 characters long.

几天前我已经通过以下方式获得了一些非常有用的帮助:从表中选择随机单词

I already got some very useful help a few days ago via: Selecting random words from table

我的基本方法是从表中选择两次,将随机选择的单词连接起来.

My basic method is to select from the table twice, and concatenate the randomly selected words.

我有一个fld_len"列,它是单词的长度.

I have a "fld_len" column which is the length of the word.

表结构:

CREATE TABLE `j_un2` (
  `fld_id` int(11) NOT NULL AUTO_INCREMENT,
  `fld_un` varchar(255) DEFAULT NULL,
  `fld_cat_id` int(11) DEFAULT NULL,
  `fld_len` int(2) NOT NULL,
  PRIMARY KEY (`fld_id`),
  KEY `cat` (`fld_cat_id`),
  KEY `bob` (`fld_len`,`fld_un`)
);

该表具有以下索引:

Keyname     Type    Field
PRIMARY     PRIMARY fld_id
bob         INDEX   fld_len, fld_un

如果我执行 ORDER BY RAND(),我发现了主要的性能问题.在 StackOverflow 和这里阅读后:http://www.warpconduit.net/2011/03/23/selecting-a-random-record-using-mysql-benchmark-results/

I found major performance issues if I did an ORDER BY RAND(). After reading on StackOverflow and here: http://www.warpconduit.net/2011/03/23/selecting-a-random-record-using-mysql-benchmark-results/

我把这个查询缩短到大约 3.7 秒:

I got this query down to about 3.7 seconds:

   SELECT CONCAT(w1.fld_un, w2.fld_un) bbb
        , FLOOR(1 + RAND() * 6318) 'rand_ind'
     FROM j_un2 w1
        , j_un2 w2 
    WHERE w1.fld_len = 8 - w2.fld_len
      AND w2.fld_len < 8
      AND RAND()<(((1/6318)*10)) 
 ORDER BY rand_ind
    LIMIT 20;

这是查询的解释计划:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE          w2      range   bob             bob     4           NULL    5886    Using where; Using index; Using temporary; Using filesort
1   SIMPLE          w1      ref     bob             bob     4           func    63      Using where; Using index

我想使用此查询的页面每月获得大约 500k 的页面浏览量,因此非常繁忙(无论如何对我来说),如果用户每次页面刷新都必须等待大约 4 秒,他们可能会对此感到恼火.

The page I'd like to use this query on gets about 500k page views a month, so is quite busy (for me anyway), and if users had to wait about 4 seconds for each page refresh, they'd probably get annoyed with it.

我也尝试在选择单词后执行 CONCAT,但运行需要 10 秒:

I did also try to do the CONCAT after selecting the words, but that took 10 seconds to run:

SELECT CONCAT(word1, word2) joined
FROM
   (SELECT w1.fld_un word1, w2.fld_un word2
        , FLOOR(1 + RAND() * 6318) 'rand_ind'
     FROM j_un2 w1
        , j_un2 w2 
    WHERE w1.fld_len = 10 - w2.fld_len
      AND w2.fld_len < 10
      AND RAND()<(((1/6318)*10)) 
 ORDER BY rand_ind
    LIMIT 20) bob;

鉴于我试图通过一种相当复杂的连接方法连接两个表,我想知道这个查询是否像以往一样快地运行,或者是否有任何空间可以加快速度?

Given I'm trying to join two tables via a rather made up join method, I wondered if this query is running as fast as it ever will, or if there is any scope to speed it up?

更新 1

实际上,我认为性能下降取决于表连接机制,如下所示:

Actually, I think the performance hit is down to the table join mechanics, as this:

   SELECT CONCAT(w1.fld_un, w2.fld_un) bbb
     FROM j_un2 w1
        , j_un2 w2 
    WHERE w1.fld_len = 8 - w2.fld_len
      AND w2.fld_len < 8
      AND RAND()<(((1/6318)*10)) 
 ORDER BY rand()
    LIMIT 20;

同时运行 - 例如按 rand() 订购没有任何区别

Runs in the same time - e.g. doesn't make any difference to order by rand()

推荐答案

此处提供了仅 SQL 的解决方案:http://mysql.rjweb.org/doc.php/random#case_consecutive_auto_increment.总之,它基于随机获取一行,假设 id 中没有间隙.这样做两次以获得两行.不需要表扫描.如果需要避免两次获得相同的单词,仍然需要代码来重试.

An SQL-only solution is found here: http://mysql.rjweb.org/doc.php/random#case_consecutive_auto_increment . In summary it fetches one row at random based, assuming there are no gaps in the id. Do that twice to get two rows. No table scan is needed. There would still need to be code to try again if need to avoid getting the same word twice.

如何使两行的总长度等于8个字符?这是一个有趣的转折.对于第二个查询:添加一个 INDEX(len, id),然后在剩余长度内使用 MIN 和 MAX id.然后调整该博客中的下一个案例"(AUTO_INCREMENT 有间隙)以找到第二行.(好吧,这有点乱,但希望我已经给了你一些需要考虑的事情.)

How to make the total length of the two rows equal 8 characters? That's an interesting twist. For the second query: Add an INDEX(len, id), then use the MIN and MAX id within the remaining length. Then adapt the next 'Case' in that blog (AUTO_INCREMENT with gaps) to find the second row. (OK, that gets messy, but hopefully I have given you some things to think about.)

该链接有 8 种有效"替代 ORDER BY RAND() LIMIT n.(没有一个是完美的".)

That link has 8 'efficient' alternatives to ORDER BY RAND() LIMIT n. (None are 'perfect'.)

这篇关于优化SQL连接随机词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆