确定最常用的单词集php mysql [英] determining most used set of words php mysql

查看:55
本文介绍了确定最常用的单词集php mysql的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图弄清楚如何确定mysql数据集上最常用的单词.

I'm trying to figure out how to go about determining the most used words on a mysql dataset.

不确定如何执行此操作,或者不确定是否有更简单的方法.阅读了几篇文章,其中有人提出了一种算法.

Not sure how to go about this or if there's a simpler approach. Read a couple posts where some suggests an algorithm.

示例:

从24,500条记录中找出使用过的前10个单词.

From 24,500 records, find out the top 10 used words.

推荐答案

对,它像狗一样运行,仅限于使用单个定界符,但希望能给您一个想法.

Right, this runs like a dog and is limited to working with a single delimiter, but hopefully will give you an idea.

SELECT aWord, COUNT(*) AS WordOccuranceCount
FROM (SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(concat(SomeColumn, ' '), ' ', aCnt), ' ', -1) AS aWord
FROM SomeTable
CROSS JOIN (
SELECT a.i+b.i*10+c.i*100 + 1 AS aCnt
FROM integers a, integers b, integers c) Sub1
WHERE (LENGTH(SomeColumn) + 1 - LENGTH(REPLACE(SomeColumn, ' ', ''))) >= aCnt) Sub2
WHERE Sub2.aWord != ''
GROUP BY aWord
ORDER BY WordOccuranceCount DESC
LIMIT 10

这依赖于具有一个称为整数的表,该表具有一个名为i的单列,该列具有10行,其值介于0到9之间.它最多可以处理约1000个单词,但是可以轻易更改以应对更多的单词(但即使这样也会减慢速度更多).

This relies on having a table called integers with a single column called i with 10 rows with the values 0 to 9. It copes with up to ~1000 words but can easily be altered to cope with more (but will slow down even more).

这篇关于确定最常用的单词集php mysql的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆