用于在列中显示单词频率的Sql server功能 [英] Sql server function for displaying word frequency in a column

查看:161
本文介绍了用于在列中显示单词频率的Sql server功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张表格,其中列出了调查中的一个自由文本输入,其中允许进入他们的回复(关于他们希望在婚礼中拥有的颜色)

I have a table that lists a freet text input from a survey where enterents were allowed to enter their responses (regarding colours they would like to have in their wedding)

我想编写一个sql函数来收集此列中的所有信息,并且订单计算每个单词的频率,按此计数对结果集进行排序。

I would like to write a sql function that gathers all the information from this column, and orders counts the frequency of each word, ordering the result set by this count.

Response
--------
Red and White
green
White and blue
Blue
Dark blue

我想按照以下方式订购上表

I would like the above table to be ordered as follows

Response  Frequency
--------  ---------
Blue      3
White     2
And       2
Red       1
Green     1

我可以在函数运行后删除所有垃圾词,如和。有谁知道产生这种行为的任何好功能?

I can strip all the rubbish words like "and" after the function has run. Does anyone know any good functions that produce this behaviour?

推荐答案

好的,这是一种享受。首先是一个分隔值的函数...

Okay this works a treat. Firstly a function to separate the values...

Alter Function dbo.SeparateValues    

(    
 @data VARCHAR(MAX),    
 @delimiter VARCHAR(10)     
)     
RETURNS     
@tbldata TABLE(col VARCHAR(MAX))    
As    
--Declare @data VARCHAR(MAX) ,@delimiter VARCHAR(10)     
--Declare @tbldata TABLE(col VARCHAR(10))    
--Set @data = 'hello,how,are,you?,234234'    
--Set @delimiter = ','    
--DECLARE @tbl TABLE(col VARCHAR(10))    
Begin    
DECLARE @pos INT    
DECLARE @prevpos INT    
SET @pos = 1     
SET @prevpos = 0    

WHILE @pos > 0     
BEGIN    
SET @pos = CHARINDEX(@delimiter, @data, @prevpos+1)    
if @pos > 0     
INSERT INTO @tbldata(col) VALUES(LTRIM(RTRIM(SUBSTRING(@data, @prevpos+1, @pos-@prevpos-1))))    
else    
INSERT INTO @tbldata(col) VALUES(LTRIM(RTRIM(SUBSTRING(@data, @prevpos+1, len(@data)-@prevpos))))    
SET @prevpos = @pos     
End    

RETURN       
END    

然后我只是将它应用到我的桌子上。 ..

then I just apply it to my table...

Select Count(*), sep.Col FROM (
        Select * FROM (
            Select value = Upper(RTrim(LTrim(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(response, ',', ' '), '.', ' '), '!', ' '), '+', ' '), ':', ' '), '-', ' '), ';', ' '), '(', ' '), ')', ' '), '/', ' '), '&', ''), '?', ' '), '  ', ' '), '  ', ' ')))) FROM Responses
        ) easyValues
        Where value <> '' 
    ) actualValues 
    Cross Apply dbo.SeparateValues(value, ' ') sep
    Group By sep.Col
    Order By Count(*) Desc

好的,所以我用我的嵌套表去了OTT,但是我已经删除了所有垃圾字符,将值分开并保持运行总计最常用的单词。

Okay, so I went OTT with my nested tables, but I've stripped out all the crap characters, separated the values and kept a running total of the most frequently used words.

这篇关于用于在列中显示单词频率的Sql server功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆