如何从文本 (NVARCHAR(MAX)) 列中提取一个或多个 URL [英] How do i extract one or multiple URLs from a text (NVARCHAR(MAX)) column

查看:28
本文介绍了如何从文本 (NVARCHAR(MAX)) 列中提取一个或多个 URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的表格中有一个数据列,在这一列中,每一行的其他文本可以有零个、一个或多个 URL.我想将这些 URL 提取到仅包含这些 URL 的新数据集中.

I have a data column in my table, and in this column there can be zero, one or multiple URLs along other text on each row. I would like to extract these URLs into a new dataset containing only these.

为什么?因为我想将其中一些 URL 添加到我的数据库中的阻止列表中以防止垃圾邮件.

Why? Because I want to add some of these URLs to a block list in my DB to prevent spam.

例如,我在数据列中有这样一段文字:

For example, I have this text in the data column:

hmaruqbtufcvdlfu, <a href="httx://portugal-forex.com/">Day forex signal strategy trading</a>, KzxiIIO, [url=httx://portugal-forex.com/]Forex Broker[/url], mtNZQDi, httx://portugal-forex.com/ The best forex broker, IBWlBzg, <a href="httx://phen375treatment.com/">Avantage inconveniant phen 375</a>, ApEuXTp, [url=httx://phen375treatment.com/]Phen375[/url], QDVLpSn, httx://phen375treatment.com/ Where to buy phen 375, Fnwpugj, <a href="httx://priligy2000.org/">Priligy t</a>, zwRZhIC, [url=httx://priligy2000.org/]Order priligy[/url], FBgSaWs, httx://priligy2000.org/ Priligy buy online, FsemWnW, <a href="httx://ossorio.org/">Online Casino</a>, aOBtTaK, [url=httx://ossorio.org/]Online Casino[/url], oMMMacf, httx://ossorio.org/ Free online casino bounuses, occFLyZ, <a href="httx://paroxetine247.com/">Paroxetine adema</a>, xvrIdnq, [url=httx://paroxetine247.com/]Paroxetine depression[/url], MLSRAXX, httx://paroxetine247.com/ Paroxetine dark skin, GLYTcZY, <a href="httx://resolvedisputes.org/">Fioricet prescription online</a>, PmEMaMA, [url=httx://resolvedisputes.org/]Fioricet wcodiene for headache[/url], vPlKLhq, httx://resolvedisputes.org/ Online pharmacy fioricet, fxfhRcV.

然后我想要文本中的所有网址:

Then I want all URLs in the text:

httx://portugal-forex.com/
httx://phen375treatment.com/
httx://priligy2000.org/
And so on.

我真的不知道从哪里开始在 SQL 中执行此操作.

I really dont know where to start the do this in SQL.

推荐答案

这是示例.我搜索从httx://"到第一个/"的字符串:

Here is example. I search string from "httx://" to first "/" :

无论如何,您都需要一行一行地进行.

In any case you will need go one by one row.

将代码放入函数

CREATE FUNCTION Temporary.getLinksFromText (@Tekstas NVARCHAR(MAX))
RETURNS @Data TABLE(TheLink NVARCHAR(500))
AS
BEGIN

    DECLARE @FirstIndexOfChar INT,
            @LastIndexOfChar INT,
            @LengthOfStringBetweenChars INT ,
            @String VARCHAR(MAX)

   SET @FirstIndexOfChar    = CHARINDEX('httx://',@Tekstas,0) 

    WHILE @FirstIndexOfChar > 0
    BEGIN

        SET @String = ''
        SET @LastIndexOfChar    = CHARINDEX('/',@Tekstas,@FirstIndexOfChar+7)
        SET @LengthOfStringBetweenChars = @LastIndexOfChar - @FirstIndexOfChar + 1

        SET @String = SUBSTRING(@Tekstas,@FirstIndexOfChar,@LengthOfStringBetweenChars)
        INSERT INTO @Data (TheLink) VALUES (@String);

        SET @Tekstas = SUBSTRING(@Tekstas, @LastIndexOfChar, LEN(@Tekstas))
        SET @FirstIndexOfChar = CHARINDEX('httx://',@Tekstas, 0) 

    END 

    RETURN
END

创建一些测试数据:

CREATE TABLE  #Data(weLink NVARCHAR(MAX));
INSERT INTO #Data VALUES 
('hmaruqbtufcvdlfu, <a href="httx://portugal-forex.com/">Day forex signal strategy trading</a>, KzxiIIO, [url=httx://portugal-forex.com/]Forex Broker[/url], mtNZQDi, httx://portugal-forex.com/ The best forex broker, IBWlBzg, <a href="httx://phen375treatment.com/">Avantage inconveniant phen 375</a>, ApEuXTp, [url=httx://phen375treatment.com/]Phen375[/url], QDVLpSn, httx://phen375treatment.com/ Where to buy phen 375, Fnwpugj, <a href="httx://priligy2000.org/">Priligy t</a>, zwRZhIC, [url=httx://priligy2000.org/]Order priligy[/url], FBgSaWs, httx://priligy2000.org/ Priligy buy online, FsemWnW, <a href="httx://ossorio.org/">Online Casino</a>, aOBtTaK, [url=httx://ossorio.org/]Online Casino[/url], oMMMacf, httx://ossorio.org/ Free online casino bounuses, occFLyZ, <a href="httx://paroxetine247.com/">Paroxetine adema</a>, xvrIdnq, [url=httx://paroxetine247.com/]Paroxetine depression[/url], MLSRAXX, httx://paroxetine247.com/ Paroxetine dark skin, GLYTcZY, <a href="httx://resolvedisputes.org/">Fioricet prescription online</a>, PmEMaMA, [url=httx://resolvedisputes.org/]Fioricet wcodiene for headache[/url], vPlKLhq, httx://resolvedisputes.org/ Online pharmacy fioricet, fxfhRcV.'),
('hmaruqbtufcvdlfu, <a href="httx://portugal-forex.com/">Day forex signal strategy trading</a>, KzxiIIO, [url=httx://portugal-forex.com/]Forex Broker[/url], mtNZQDi, httx://portugal-forex.com/ The best forex broker, IBWlBzg, <a href="httx://phen375treatment.com/">Avantage inconveniant phen 375</a>, ApEuXTp, [url=httx://phen375treatment.com/]Phen375[/url], QDVLpSn, httx://phen375treatment.com/ Where to buy phen 375, Fnwpugj, <a href="httx://priligy2000.org/">Priligy t</a>, zwRZhIC, [url=httx://priligy2000.org/]Order priligy[/url], FBgSaWs, httx://priligy2000.org/ Priligy buy online, FsemWnW, <a href="httx://ossorio.org/">Online Casino</a>, aOBtTaK, [url=httx://ossorio.org/]Online Casino[/url], oMMMacf, httx://ossorio.org/ Free online casino bounuses, occFLyZ, <a href="httx://paroxetine247.com/">Paroxetine adema</a>, xvrIdnq, [url=httx://paroxetine247.com/]Paroxetine depression[/url], MLSRAXX, httx://paroxetine247.com/ Paroxetine dark skin, GLYTcZY, <a href="httx://resolvedisputes.org/">Fioricet prescription online</a>, PmEMaMA, [url=httx://resolvedisputes.org/]Fioricet wcodiene for headache[/url], vPlKLhq, httx://resolvedisputes.org/ Online pharmacy fioricet, fxfhRcV.')

你可以这样执行(没有光标)

And you can execute it like this (without cursor)

SELECT allLinks.*
FROM #Data AS D
OUTER APPLY Temporary.getLinksFromText (D.weLink) AS allLinks

这篇关于如何从文本 (NVARCHAR(MAX)) 列中提取一个或多个 URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆