如何在 sql 中创建查询以将句子切成单词并将它们添加到新表中的频率 [英] how to create a query in sql to chop sentences into words and add them to new table with their frequency

查看:54
本文介绍了如何在 sql 中创建查询以将句子切成单词并将它们添加到新表中的频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试做一个我不确定是否可行的查询我有一个名为 sentence 的表,其中包含 ID、Sentences 和验证,如下图所示.

我有另一个名为字数的表格,其中包含 ID、字词和出现频率.所以我希望当一个句子输入更新或删除时,该表相应地更新或每天更新,因为可能有很多句子

我的预期输出类似于下面的图片.

任何想法都是可行的,任何人都可以帮忙.

解决方案

如果您运行的是 MySQL 8.0,我会为此推荐一个递归公用表表达式.这个想法是迭代地遍历每条消息,沿途将其拆分为单词.然后剩下要做的就是聚合.

 with recursive cte as (选择substring(concat(sent, ' '), 1, locate(' ', sent)) word,substring(concat(sent, ' '), locate(' ', sent) + 1) sent从消息联合所有选择substring(sent, 1, locate(' ', sent)) 字,substring(sent, locate(' ', sent) + 1) sent来自 ctewhere locate(' ', sent) >0)选择 row_number() over(order by count(*) desc, word) wid, word, count(*) freq来自 cte按词分组按wid排序

在早期版本中,您可以使用数字表模拟相同的行为.

DB Fiddle 演示

示例数据:

<前>发送 |验证:------------------------- |----:你好,我的名字是亚历克斯 |嘿alin 和alex 我是tom |你好亚历克斯我的名字是阿林 |

结果:

<前>宽|词 |频率--: |:----- |---:1 |亚历克斯 |32 |阿林|23 |你好 |24 |是 |25 |我的 |26 |姓名 |27 |和|18 |嘿|19 |我是 |110 |汤姆 |1

说到在单独的表中维护查询结果,可能比你想象的要复杂:你需要能够根据原表的变化插入、删除或更新目标表,这不能在 MySQL 的单个语句中完成.此外,使原始表中的标志保持最新会产生竞争条件,在更新目标表时可能会发生更改.

一个更简单的选择是将查询放在视图中,这样您就可以获得有关数据的始终最新的视角.为此,您可以将上述查询包装在 create view 语句中,例如:

创建视图 words_view 为 <以上查询>;

如果性能成为问题,那么您也可以定期截断和重新填充词表.

截断表格词;插入单词 <以上查询>;

I'm trying to do a query that I'm not sure if it's possible I have a table called sentencess which contain ID, Sentences, and verify as shown in the picture bellow.

I have another table called, word count which contains ID, words, and there frequency. so I want when ever if a sentence entered updated, or deleted for this table to be updated accordingly or updated ones a day because there might be a lot of sentences

my expected output is something like the picture bellow.

any ideas is this doable can anyone help please.

解决方案

If you are running MySQL 8.0, I would recommend a recursive common table expression for this. The idea is to iteratively walk each message, splitting it into words along the way. All that is then left to do is to aggregate.

with recursive cte as (
    select 
        substring(concat(sent, ' '), 1, locate(' ', sent)) word,
        substring(concat(sent, ' '), locate(' ', sent) + 1) sent
    from messages
    union all
    select 
        substring(sent, 1, locate(' ', sent)) word,
        substring(sent, locate(' ', sent) + 1) sent
    from cte
    where locate(' ', sent) > 0
)
select row_number() over(order by count(*) desc, word) wid, word, count(*) freq
from cte 
group by word
order by wid

In earlier versions, you could emulate the same behavior with a numbers table.

Demo on DB Fiddle

Sample data:

sent                       | verif
:------------------------- | ----:
hello my name is alex      |  null
hey alin and alex I'm tom  |  null
hello alex my name is alin |  null

Results:

wid | word   | freq
--: | :----- | ---:
  1 | alex   |    3
  2 | alin   |    2
  3 | hello  |    2
  4 | is     |    2
  5 | my     |    2
  6 | name   |    2
  7 | and    |    1
  8 | hey    |    1
  9 | I'm    |    1
 10 | tom    |    1

When it comes to maintaining the results of the query in a separate table, it is probably more complicated than you think: you need to be able to insert, delete or update the target table depending on the changes in the original table, which cannot be done in a single statement in MySQL. Also, keeping a flag up to date in the original table creates a race condition, where changes might occur while your are updating the target target table.

A simpler option would be to put the query in a view, so you get an always-up-to-date perspective on your data. For this, you can just wrap the above query in a create view statement, like:

create view words_view as < above query >;

If performance becomes a problem, then you could also truncate and refill the words table periodically.

truncate table words;
insert into words < above query >;

这篇关于如何在 sql 中创建查询以将句子切成单词并将它们添加到新表中的频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆