如何从MySQL表的多个列中获取指定最小长度的所有不同单词? [英] How to get all distinct words of a specified minimum length from multiple columns in a MySQL table?

查看：83 发布时间：2020/5/15 5:38:04 mysql sql regex text

本文介绍了如何从MySQL表的多个列中获取指定最小长度的所有不同单词?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在MySQL 5.6数据库中，我有表tablename，其中有(包括其他)三个TEXT列:col_a, col_b, col_c.

In a MySQL 5.6 database I have table tablename which has (including others) three TEXT columns: col_a, col_b, col_c.

我想从这三列中提取至少5个字符的所有唯一单词(单词之间用空格隔开). 单词"是指任何非空格字符字符串，例如"foo-123"和"099423"都是单词.这些列都是utf8格式的InnoDB列.

I want to extract all unique words (with words being separated by spaces) from these three columns that are at least 5 characters long. By "word" I mean any string of non-space characters, eg "foo-123" would be a word, as would "099423". The columns are all utf8 format InnoDB columns.

是否只有一个查询来执行此操作?

Is there a single query to do this?

根据要求，下面是一个示例:(在实际数据中，col_a，col_b和col_c是TEXT字段，可能包含大量单词.)

As requested, here's an example: (in the real data col_a, col_b and col_c are TEXT fields and could have a large number of words.)

select id, col_a, col_b, col_c from tablename;

id  | col_a              | col_b          | col_c
----|--------------------|----------------|----------------------
1   | apple orange plum  | red green blue | bill dave sue
2   | orange plum banana | yellow red     | frank james
3   | kiwi fruit apple   | green pink     | bill sarah-jane frank

expected_result: ["apple", "orange", "banana", "fruit", 
                  "green", "yellow", "frank", "james", "sarah-jane"]

我不在乎结果的顺序.谢谢！

I don't care about the order of results. thanks!

在上面的示例中，所有内容都是小写字母，因为这就是我将所有内容存储在与该问题相关的现实表中的方式.但是，出于争论的目的，如果它确实包含一些大写字母，我希望查询忽略大写字母(这是我的数据库配置的设置).

in my example above, everything is in lowercase, as that's how I happen to store everything in my real-life table that this question relates to. But, for the sake of argument, if it did contain some capitalisation I would prefer the query to ignore capitalisation (this is the setting of my DB config as it happens).

如果有帮助，所有文本列上都具有FULLTEXT索引.

in case it helps, all of the text columns have a FULLTEXT index on them.

这是创建示例数据的SQL:

here is the SQL to create the sample data:

DROP TABLE IF EXISTS `tablename`;
CREATE TABLE `tablename` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `col_a` text,
  `col_b` text,
  `col_c` text,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8;
LOCK TABLES `tablename` WRITE;
INSERT INTO `tablename` VALUES (1,'apple orange plum','red green blue','bill dave sue'),(2,'orange plum banana','yellow red','frank james'),(3,'kiwi fruit apple','green pink','bill sarah-jane frank');
UNLOCK TABLES;

推荐答案

Shell脚本可能是有效的...

Shell script might be efficient...

SELECT CONCAT_WS(' ', col_a, col_b, col_c) INTO OUTFILE 'x' ...将列放入文件中
tr ' ' "\n" <x-每行分成一个单词
awk 'length($1) >= 5'-每个单词最小5个字符的大小
sort -u-取消

SELECT CONCAT_WS(' ', col_a, col_b, col_c) INTO OUTFILE 'x' ... to get the columns into a file
tr ' ' "\n" <x -- split into one word per line
awk 'length($1) >= 5' -- minimum size of 5 characters per word
sort -u -- to dedup

没有停用词，但sed或awk可以解决.

There are no stopwords, but sed or awk could deal with that.

 mysql -e "SELECT ... INTO OUTFILE 'x' ..." ...
 tr ' ' "\n" <x  |  awk 'length($1) >= 5'  |  sort -u

这篇关于如何从MySQL表的多个列中获取指定最小长度的所有不同单词?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何从MySQL表的多个列中获取指定最小长度的所有不同单词? [英] How to get all distinct words of a specified minimum length from multiple columns in a MySQL table?

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

如何从MySQL表的多个列中获取指定最小长度的所有不同单词? [英] How to get all distinct words of a specified minimum length from multiple columns in a MySQL table?

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭