从表中选择随机单词 [英] Selecting random words from table

查看:114
本文介绍了从表中选择随机单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有以下基本MySQL数据:

Say I have the following basic MySQL data:

CREATE TABLE my_words (my_word VARCHAR(255));

INSERT INTO my_words VALUES ('dog');
INSERT INTO my_words VALUES ('cat');
INSERT INTO my_words VALUES ('tree');
INSERT INTO my_words VALUES ('ball');
INSERT INTO my_words VALUES ('life');
INSERT INTO my_words VALUES ('complex');
INSERT INTO my_words VALUES ('digeridoo');
INSERT INTO my_words VALUES ('hamster');
INSERT INTO my_words VALUES ('it');
INSERT INTO my_words VALUES ('house');
INSERT INTO my_words VALUES ('love');
INSERT INTO my_words VALUES ('zealous');
INSERT INTO my_words VALUES ('nevis');
INSERT INTO my_words VALUES ('mountain');
INSERT INTO my_words VALUES ('call');
INSERT INTO my_words VALUES ('nail');
INSERT INTO my_words VALUES ('rat');
INSERT INTO my_words VALUES ('hat');

SELECT CONCAT(w1.my_word, w2.my_word) joined
FROM my_words w1, my_words w2
WHERE LENGTH(CONCAT(w1.my_word, w2.my_word)) = 8
ORDER BY RAND() LIMIT 5;

我可以在末尾编写SQL语句,以生成由2个单词组成的5个随机串联字符串的列表,其中字符串的总长度为8个字符.

I can write the SQL statement at the end to generate a list of 5 random concatenated strings made up of 2 words, where the total length of the string is 8 characters.

对于像我在样本中一样的简单数据表,这很好用.

That works fine for a simple table of data like I've got in the sample.

但是,我正在使用的真实"表包含大约6,200行.

However, the "real" table I am working with contains about 6,200 lines.

如果我尝试相同类型的语句,则需要10秒才能生成5个字符串.

If I try the same type of statement it takes 10 seconds to generate the 5 strings.

我想SQL效率很低,因为它每次都两次查询该表,并且这些表没有以任何方式联接.

I guess the SQL is very inefficient as it is searching through the table twice each time and those tables aren't joined in any way.

我想知道是否存在一种更简单的方法来从表中提取由2个单词组成的单词字符串,其中连接字符串的长度为8个字符长(尽管可以更改-我只是将8用作例如).

I wondered if there is a simpler way to extract strings of words made up of 2 words from the table, where the length of the concatenated string is 8 characters long (though that can change - I'm just using 8 as an example).

谢谢

更新1

说明计划:

EXPLAIN
SELECT CONCAT(w1.fld_un, w2.fld_un) joined
FROM j_un w1
JOIN j_un w2 ON w1.fld_len = 8 - w2.fld_len
WHERE w2.fld_len < 8
ORDER BY RAND()
LIMIT 5;

id  select_type table   type    possible_keys   key     key_len     ref rows    Extra
1   SIMPLE      w2      range   un_len          un_len  5   \N      2694        Using where; Using temporary; Using filesort
1   SIMPLE      w1      ref     un_len          un_len  5   func    527         Using where


更新2

我不确定是否相关,但是"fld_un"表大约有6,200行.

I'm not sure if it's relevant, but the "fld_un" table has about 6,200 rows.

单词"保存在"fld_un"列中.

The "word" is held in the "fld_un" column.

表的结构为:

Field       Type            Null    Key     Default     Extra
fld_id      int(11)         NO      PRI     NULL        auto_increment
fld_un      varchar(255)    YES             NULL     
fld_cat_id  int(11)         YES     MUL     NULL     
fld_len     int(2)          NO      MUL     NULL    

这些索引存在于表中:

Keyname     Type    Cardinality Field
PRIMARY     PRIMARY 6318        fld_id
cat         INDEX   15          fld_cat_id
bob         INDEX   11          fld_len 

表上已经有一个主索引是否重要?从技术上讲,我不需要.

Does it matter that there is already a primary index on the table? I don't technically need that I suppose.

声明:

SELECT CONCAT(word1, word2) joined
FROM (
    SELECT w1.fld_un word1, w2.fld_un word2
    FROM j_un2 w1
    JOIN j_un2 w2 ON w1.fld_len = 8 - w2.fld_len
    WHERE w2.fld_len < 8
    ORDER BY RAND()
    LIMIT 5) x;

查询耗时23.6805秒

Query took 23.6805 sec

说明计划:

id  select_type     table       type    possible_keys   key     key_len     ref     rows    Extra
1   PRIMARY         <derived2>  ALL     NULL            NULL    NULL        NULL    5    
2   DERIVED         w2          range   bob             bob     4           NULL    4627    Using where; Using temporary; Using filesort
2   DERIVED         w1          ref     bob             bob     4           func    527     Using where

当我按照Thorsten Kettner的建议修改"bob"索引以包括2列时:

When I revise the "bob" index to include 2 columns as suggested by Thorsten Kettner:

Keyname     Type    Cardinality Field
bob         INDEX   11          fld_len, fld_un

然后重新测试:

SELECT CONCAT(word1, word2) joined
FROM (
    SELECT w1.fld_un word1, w2.fld_un word2
    FROM j_un2 w1
    JOIN j_un2 w2 ON w1.fld_len = 8 - w2.fld_len
    WHERE w2.fld_len < 8
    ORDER BY RAND()
    LIMIT 5) x;

查询花了30.3394秒返回5行.

The query took 30.3394 sec to return 5 rows.

说明计划:

id  select_type     table       type    possible_keys   key     key_len     ref     rows    Extra
1   PRIMARY         <derived2>  ALL     NULL            NULL    NULL        NULL    5    
2   DERIVED         w2          range   bob             bob     4           NULL    4211    Using where; Using temporary; Using filesort
2   DERIVED         w1          ref     bob             bob     4           func    527     Using where


更新3

没有按rand()命令"运行,运行了0.0011秒!

Ran without "order by rand()" and it ran in 0.0011 sec!

推荐答案

您可以添加一列,例如包含单词长度的word_length,并在word_length列上添加索引.通常,包含可以从另一列派生的数据的设计会很糟糕,但是在这种情况下,为了性能起见,您需要破坏纯度.然后,您的查询可以在此列中使用JOIN条件:

You can add a column, e.g. word_length that contains the length of the word, and add an index on the word_length column. Normally it would be poor design to include data that can be derived from another column, but in this case you need to break purity for performance sake. Then your query can use a JOIN condition using this column:

SELECT CONCAT(w1.my_word, w2.my_word) joined
FROM my_words w1
JOIN my_words w2 ON w1.word_length = 8 - w2.word_length
WHERE w2.word_length < 8
ORDER BY RAND()
LIMIT 5

您可以使用INSERTUPDATE触发器自动填充word_length列.

You can use INSERT and UPDATE triggers to fill in the word_length column automatically.

在过滤到5行之后进行连接可能也会有所帮助:

It might also be helpful to do the concatenation after filtering down to the 5 rows:

SELECT CONCAT(word1, word2) joined
FROM (
    SELECT w1.my_word word1, w2.my_word word2
    FROM my_words w1
    JOIN my_words w2 ON w1.word_length = 8 - w2.word_length
    WHERE w2.word_length < 8
    ORDER BY RAND()
    LIMIT 5) x

这篇关于从表中选择随机单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆