优化sql函数获取常用元素 [英] optimise sql function to get common elements
问题描述
我有一个函数,它接受两个分隔的字符串并返回公共元素的数量.
I have a function that takes two delimited strings and returns the number of common elements. The
该函数的主要代码为(@intCount为预期返回值)
The main code of the function is (@intCount is the expected return value)
SET @commonCount = (select count(*) from (
select token from dbo.splitString(@userKeywords, ';')
intersect
select token from dbo.splitString(@itemKeywords, ';')) as total)
其中 splitString 使用 while 循环和 charIndex 将字符串拆分为分隔的标记并将其插入表中.
where splitString uses a while loop and charIndex to split a string into delimited tokens and inserts it into a table.
我遇到的问题是,这只能以每秒 100 行左右的速度处理,而且根据我的数据集的大小,这将需要大约 8-10 天才能完成.
The problem I am having is that this only processes at a speed of about 100 rows per second and by the size of my dataset, this will take about 8-10 days to finish.
两个字符串的大小最长可达 1500 个字符.
The size of the two strings may be upto 1500 characters in length.
无论如何,我是否可以以足够快的速度实现这一目标以供使用?
Is there anyway I can achieve this fast enough to be usable?
推荐答案
性能问题可能是游标(对于 while 循环)和用户定义函数的组合.
The performance problem is probably the combination of a cursor (for the while loop) and the user defined function.
如果这些字符串中有一个是常量(比如item关键字),你可以独立搜索:
If one of these strings is constant (such as item key words), you can search for each one independently:
select *
from users u
where charindex(';'+<item1>+';', ';'+u.keywords) > 0
union all
select *
from users u
where charindex(';'+<item2>+';', ';'+u.keywords) > 0 union all
或者,基于集合的方法可以工作,但您必须规范化数据(插入此处以获取正确格式的数据开始).也就是说,您需要一个具有以下内容的表:
Alternatively, a set based approach can work, but you have to normalize the data (plug here for having data in the right format to begin with). That is, you want a table that has:
userid
keyword
另一个有
itemid
keyword
(如果有不同类型的项目.否则这只是一个关键字列表.)
(if there are different types of items. Otherwise this is just a list of keywords.)
那么您的查询将如下所示:
Then your query would look like:
select *
from userkeyword uk join
itemkeyword ik
on uk.keyword = ik.keyword
SQL 引擎会发挥它的魔力.
And the SQL engine would perform its magic.
现在,您如何创建这样的列表?如果您每个用户只有少数几个关键词,那么您可以执行以下操作:
Now, how can you create such a list? If you have only a handful of key words per user, then you can do something like:
with keyword1 as (select u.*, charindex(';', keywords) as pos1,
left(keywords, charindex(';', keywords)-1) as keyword1
from user u
where charindex(';', keywords) > 0
),
keyword2 as (select u.*, charindex(';', keywords, pos1+1) as pos2,
left(keywords, charindex(';', keywords)-1, pos1+1) as keyword2
from user u
where charindex(';', keywords, pos1+2) > 0
),
...
select userid, keyword1
from keyword1
union all
select userid, keyword2
from keyword2
...
要获取 itemKeyWords 中元素的最大数量,可以使用以下查询:
To get the maximum number of elements in the itemKeyWords, you can use the following query:
select max(len(Keywords) - len(replace(Keywords, ';', '')))
from user
这篇关于优化sql函数获取常用元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!