使用sqlite在列中查找最常见的单词? [英] Finding most common words in a column using sqlite?

查看:36
本文介绍了使用sqlite在列中查找最常见的单词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这样的数据:

            movie_id    comment
            1           tom cruise is great
            1           great action movie
            2           got teary eyed
            2           great cast
            1           tom cruise is hott

我想要一个函数,它根据我选择的电影 ID 返回评论中最常用的词.所以如果我查询movie_id=1,我会得到:

I'd like a function that returns the most common words in the comments, based on what movie_id I select. So if I'm querying movie_id=1, I'd get:

            tom, 2
            cruise, 2
            is, 2
            great, 2
            hott, 1
            action, 1
            movie, 1

虽然如果我查询 movie_id=2,我会得到:

While if I query movie_id=2, I'd get:

            got, 1
            teary, 1
            eyed, 1
            great, 1
            cast, 1

我看到了一些使用 tsql 的解决方案,但我以前从未使用过,也不了解代码.在 sqlite3 中寻找一种方法来做到这一点.

I saw some solutions using tsql, but I've never used that before and didn't understand the code. Looking for a way to do this in sqlite3.

推荐答案

你可以用一个非常丑陋的查询来做到这一点.

You can do this with a really ugly query.

select word, count(*) from (
select (case when instr(substr(m.comments, nums.n+1), ' ') then substr(m.comments, nums.n+1)
             else substr(m.comments, nums.n+1, instr(substr(m.comments, nums.n+1), ' ') - 1)
        end) as word
from (select ' '||comments as comments
      from m
     )m cross join
     (select 1 as n union all select 2 union all select 3
     ) nums
where substr(m.comments, nums.n, 1) = ' ' and substr(m.comments, nums.n, 1) <> ' '
) w
group by word
order by count(*) desc

这是未经测试的.内部查询需要一个数字列表(此处限制为 3 个;您可以查看如何添加更多数字).然后检查单词是否从位置 n+1 开始.一个词在一个空格之后开始,所以我在评论的开头加了一个空格.

This is untested. The inner query needs a list of numbers (limited to just 3 here; you can see how to add more). It then checks to see if a word starts at position n+1. A word starts after a space, so I put a space at the beginning of the comments.

然后它把词拉出来,用于聚合目的.

Then it pulls the word out, for aggregation purposes.

这篇关于使用sqlite在列中查找最常见的单词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆