用于在列中显示单词频率的Sql server功能 [英] Sql server function for displaying word frequency in a column
问题描述
我有一张表格,其中列出了调查中的一个自由文本输入,其中允许进入他们的回复(关于他们希望在婚礼中拥有的颜色)
I have a table that lists a freet text input from a survey where enterents were allowed to enter their responses (regarding colours they would like to have in their wedding)
我想编写一个sql函数来收集此列中的所有信息,并且订单计算每个单词的频率,按此计数对结果集进行排序。
I would like to write a sql function that gathers all the information from this column, and orders counts the frequency of each word, ordering the result set by this count.
Response
--------
Red and White
green
White and blue
Blue
Dark blue
我想按照以下方式订购上表
I would like the above table to be ordered as follows
Response Frequency
-------- ---------
Blue 3
White 2
And 2
Red 1
Green 1
我可以在函数运行后删除所有垃圾词,如和。有谁知道产生这种行为的任何好功能?
I can strip all the rubbish words like "and" after the function has run. Does anyone know any good functions that produce this behaviour?
推荐答案
好的,这是一种享受。首先是一个分隔值的函数...
Okay this works a treat. Firstly a function to separate the values...
Alter Function dbo.SeparateValues
(
@data VARCHAR(MAX),
@delimiter VARCHAR(10)
)
RETURNS
@tbldata TABLE(col VARCHAR(MAX))
As
--Declare @data VARCHAR(MAX) ,@delimiter VARCHAR(10)
--Declare @tbldata TABLE(col VARCHAR(10))
--Set @data = 'hello,how,are,you?,234234'
--Set @delimiter = ','
--DECLARE @tbl TABLE(col VARCHAR(10))
Begin
DECLARE @pos INT
DECLARE @prevpos INT
SET @pos = 1
SET @prevpos = 0
WHILE @pos > 0
BEGIN
SET @pos = CHARINDEX(@delimiter, @data, @prevpos+1)
if @pos > 0
INSERT INTO @tbldata(col) VALUES(LTRIM(RTRIM(SUBSTRING(@data, @prevpos+1, @pos-@prevpos-1))))
else
INSERT INTO @tbldata(col) VALUES(LTRIM(RTRIM(SUBSTRING(@data, @prevpos+1, len(@data)-@prevpos))))
SET @prevpos = @pos
End
RETURN
END
然后我只是将它应用到我的桌子上。 ..
then I just apply it to my table...
Select Count(*), sep.Col FROM (
Select * FROM (
Select value = Upper(RTrim(LTrim(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(response, ',', ' '), '.', ' '), '!', ' '), '+', ' '), ':', ' '), '-', ' '), ';', ' '), '(', ' '), ')', ' '), '/', ' '), '&', ''), '?', ' '), ' ', ' '), ' ', ' ')))) FROM Responses
) easyValues
Where value <> ''
) actualValues
Cross Apply dbo.SeparateValues(value, ' ') sep
Group By sep.Col
Order By Count(*) Desc
好的,所以我用我的嵌套表去了OTT,但是我已经删除了所有垃圾字符,将值分开并保持运行总计最常用的单词。
Okay, so I went OTT with my nested tables, but I've stripped out all the crap characters, separated the values and kept a running total of the most frequently used words.
这篇关于用于在列中显示单词频率的Sql server功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!