优化sql函数获取常用元素 [英] optimise sql function to get common elements

查看:43
本文介绍了优化sql函数获取常用元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个函数,它接受两个分隔的字符串并返回公共元素的数量.

I have a function that takes two delimited strings and returns the number of common elements. The

该函数的主要代码为(@intCount为预期返回值)

The main code of the function is (@intCount is the expected return value)

    SET @commonCount = (select count(*) from (
    select token from dbo.splitString(@userKeywords, ';')
    intersect
    select token from dbo.splitString(@itemKeywords, ';')) as total)

其中 splitString 使用 while 循环和 charIndex 将字符串拆分为分隔的标记并将其插入表中.

where splitString uses a while loop and charIndex to split a string into delimited tokens and inserts it into a table.

我遇到的问题是,这只能以每秒 100 行左右的速度处理,而且根据我的数据集的大小,这将需要大约 8-10 天才能完成.

The problem I am having is that this only processes at a speed of about 100 rows per second and by the size of my dataset, this will take about 8-10 days to finish.

两个字符串的大小最长可达 1500 个字符.

The size of the two strings may be upto 1500 characters in length.

无论如何,我是否可以以足够快的速度实现这一目标以供使用?

Is there anyway I can achieve this fast enough to be usable?

推荐答案

性能问题可能是游标(对于 while 循环)和用户定义函数的组合.

The performance problem is probably the combination of a cursor (for the while loop) and the user defined function.

如果这些字符串中有一个是常量(比如item关键字),你可以独立搜索:

If one of these strings is constant (such as item key words), you can search for each one independently:

select *
from users u
where charindex(';'+<item1>+';', ';'+u.keywords) > 0
union all
select *
from users u
where charindex(';'+<item2>+';', ';'+u.keywords) > 0 union all

或者,基于集合的方法可以工作,但您必须规范化数据(插入此处以获取正确格式的数据开始).也就是说,您需要一个具有以下内容的表:

Alternatively, a set based approach can work, but you have to normalize the data (plug here for having data in the right format to begin with). That is, you want a table that has:

userid
keyword

另一个有

itemid
keyword

(如果有不同类型的项目.否则这只是一个关键字列表.)

(if there are different types of items. Otherwise this is just a list of keywords.)

那么您的查询将如下所示:

Then your query would look like:

select *
from userkeyword uk join
     itemkeyword ik
     on uk.keyword = ik.keyword

SQL 引擎会发挥它的魔力.

And the SQL engine would perform its magic.

现在,您如何创建这样的列表?如果您每个用户只有少数几个关键词,那么您可以执行以下操作:

Now, how can you create such a list? If you have only a handful of key words per user, then you can do something like:

with keyword1 as (select u.*, charindex(';', keywords) as pos1,
                         left(keywords, charindex(';', keywords)-1) as keyword1
                  from user u
                  where charindex(';', keywords) > 0
                 ),
     keyword2 as (select u.*, charindex(';', keywords, pos1+1) as pos2,
                         left(keywords, charindex(';', keywords)-1, pos1+1) as keyword2
                  from user u
                  where charindex(';', keywords, pos1+2) > 0
                 ),
        ...
select userid, keyword1
from keyword1
union all
select userid, keyword2
from keyword2
...

要获取 itemKeyWords 中元素的最大数量,可以使用以下查询:

To get the maximum number of elements in the itemKeyWords, you can use the following query:

select max(len(Keywords) - len(replace(Keywords, ';', '')))
from user

这篇关于优化sql函数获取常用元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆