一列有多个文本值,需要查询找到重复次数最多的词 [英] Multiple text values in one column, need the query to find the most repeatable word

查看:30
本文介绍了一列有多个文本值,需要查询找到重复次数最多的词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一列存储用户的简历/职位.它是由用户自定义编写的,可以包含尽可能多的单词.

I have a column that stores the bio/title of users. It's custom written by users and can have as many words.

id title
1  Business Development Executive Cold Calling & Cold Emailing expert Entrepreneur
2  Director of Online Marketing and entrepreneur
3  Art Director and Entrepreneur 
4  Corporate Development at Yahoo!
5  Snr Program Manager, Yahoo 

我试图找出一个显示词频的 mysql 查询:

I am trying to figure out a mysql query that shows word frequencies:

Entrepreneur 3
development  2
director     2 

我知道是否可以将值中的每个单词作为单独的行返回,然后我可以使用正常分组.我查看过,但找不到一个将文本拆分为单独一行中的单词的函数.

I know if I could return each word in the value as a separate row, I could then use normal grouping. I have looked, but can't find, a function that splits text into words each in a separate row.

可以吗?

推荐答案

您可以通过加入用于挑选第 n 个单词的制造数字系列来实现.可惜mysql没有内置方法生成系列,所以有点难看,不过这里是:

You can do it by joining with a manufactured number series that is used to pick out the nth word. Unfortunately, mysql has no built-in method if generating a series, so it's a little ugly, but here it is:

select
  substring_index(substring_index(title, ' ', num), ' ', -1) word,
  count(*) count
from job j
join (select 1 num union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9 union select 10 union select 11 union select 12) n
on length(title) >= length(replace(title, ' ', '')) + num - 1
group by 1
order by 2 desc

查看SQLFiddle 现场演示,使用您的数据并生成您的预期输出.

See a live demo on SQLFiddle using your data and producing your expected output.

遗憾的是,必须对数字系列的每个值进行硬编码的限制也限制了将要处理的列的字数(在本例中为 12).系列中的数字是否过多也没有关系,您可以随时添加更多数字以覆盖更大的预期输入文本.

Sadly, the limitation of having to hard code every value of the number series also limits the number of words of the column that will be processed (in this case 12). It doesn't matter if there are too many numbers in the series, and you can always add more to cover a larger expected input text.

这篇关于一列有多个文本值,需要查询找到重复次数最多的词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆