如何计算 Teradata 中的词频 [英] How to count words frequency in Teradata

查看：40 发布时间：2021/9/5 20:25:00 sql teradata word-cloud

本文介绍了如何计算 Teradata 中的词频的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

例如，如果我有 1000 行数据，其中包含客户 ID(例如 123)以及他们对我们产品的评论(例如出色的产品易于使用)

For example if I have 1000 rows of data that has customer ID (e.g. 123) and their comments on our product (e.g. great product easy use)

我如何使用 Teradata(版本 15)进行词频计数，以便输出有两列，一列是单词，另一列是频率，例如(很棒:20，产品:10)?

How do I use Teradata (version 15) to do a word frequency count so that the output has two columns one with the word and the other with frequency e.g. (Great: 20, Product: 10)?

谢谢

推荐答案

您可以使用 strtok_split_to_table 来解决这个问题.

You could use strtok_split_to_table to pull this off.

类似于以下内容:

SELECT d.token, SUM(d.outkey)
FROM TABLE (strtok_split_to_table(1, <yourtable>.<yourcommentsfield>, ' ')
        RETURNS (outkey integer, tokennum integer, token varchar(20)character set unicode) ) as d 
GROUP BY 1

这会将您评论字段中的每个单词拆分为单独的记录，然后计算每个单词的出现次数.只需将您自己的 <yourtable>. 粘贴在其中，您就可以开始使用了.


This will split each word in your comments field into individual records, then it counts the occurrence of each word. Just stick your own <yourtable>.<yourcommentsfield> in there and you should be good to go.
有关 strtok_split_to_table 的更多信息:http://www.info.teradata.com/HTMLPubs/DB_TTU_14_00/index.html#page/SQL_Reference/B035_1145_111A/String_Ops_Funcs.084.242.html
More information on strtok_split_to_table: http://www.info.teradata.com/HTMLPubs/DB_TTU_14_00/index.html#page/SQL_Reference/B035_1145_111A/String_Ops_Funcs.084.242.html
这是在我的系统上测试的 SQL 和结果:
Here is the SQL and results for a test on my system:
CREATE SET TABLE db.testcloud ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO
     (
      customer VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC,
      comments VARCHAR(1000) CHARACTER SET LATIN NOT CASESPECIFIC)
PRIMARY INDEX ( customer );


INSERT INTO testcloud (1, 'This is a test comment');
INSERT INTO testcloud (2, 'This is also comment of something');

SELECT d.token, SUM(d.outkey)
FROM TABLE (TD_SYSFNLIB.strtok_split_to_table(1, testcloud.comments, ' -/')
        RETURNS (outkey integer, tokennum integer, token varchar(20)character set unicode) ) as d 
GROUP BY 1

--token Sum(outkey)
--is    2
--also  1
--This  2
--of    1
--test  1
--a 1
--comment   2
--something 1


                        这篇关于如何计算 Teradata 中的词频的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何计算 Teradata 中的词频 [英] How to count words frequency in Teradata

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何计算 Teradata 中的词频 [英] How to count words frequency in Teradata

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭