SQL Server 2012:提取正则表达式组 [英] SQL Server 2012 : extract Regex groups

查看:50
本文介绍了SQL Server 2012:提取正则表达式组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据库中有 Markdown 格式的文本.我想提取链接并计算我拥有的匹配链接的数量.我可以使用与此类似的查询获取包含链接的文本块列表:

I have text in my database in Markdown format. I'd like to extract links and count the number of matching links I have. I can get a listing of text blocks that contain links using a query similar to this:

SELECT post_text
FROM posts p
WHERE p.body like '%\[%](http%)%' ESCAPE '\'

我如何进入下一步,只提取文本的链接部分(括号中的部分)?如果我能得到这个,我就可以计算这个特定链接在我的数据集中出现的次数.

How do I go to the next step though, and just extract the link portion of the text (the part that is in the parenthesis)? If I can get this, I can count the number of times this specific link is in my dataset.

一些示例数据:

"Visit [Google](http://google.com)"    -> Should return "http://google.com"
"Get an [iPhone](http://www.apple.com) (I like it better than Android)"   -> Should return "http://www.apple.com"
"[Example](http://example.com)"    -> Should return "http://example.com"
"This is a message"    -> Nothing to return on this one, no link
"I like cookies (chocolate chip)"  -> Nothing to return on this one, no link
"[Frank] says 'Hello'" -> Nothing to return on this one, no link

我使用的是 SQL Server 2012(如果版本之间在这方面存在差异).

I am using SQL Server 2012 (if there are differences between versions in this regard).

推荐答案

假设实际数据并不比所述示例复杂,这应该可以在不求助于 RegEx 的情况下工作:

Assuming the actual data is no more complex than the stated examples, this should work without resorting to RegEx:

DECLARE @posts TABLE
(
   post_id INT NOT NULL IDENTITY(1, 1),
   post_text NVARCHAR(4000) NOT NULL,
   body NVARCHAR(2048) NULL
);
INSERT INTO @posts (post_text, body) VALUES (N'first',
                                           N'Visit [Google](http://google.com)');
INSERT INTO @posts (post_text, body) VALUES (N'second',
                                           N'Get an [iPhone](http://www.apple.com)');
INSERT INTO @posts (post_text, body) VALUES (N'third',
                                           N'[Example](http://example.com)');
INSERT INTO @posts (post_text, body) VALUES (N'fourth',
                                           N'This is a message');
INSERT INTO @posts (post_text, body) VALUES (N'fifth',
                                           N'I like cookies (chocolate chip)');
INSERT INTO @posts (post_text, body) VALUES (N'sixth',
                                           N'[Frankie] says ''Relax''');
INSERT INTO @posts (post_text, body) VALUES (N'seventh',
                                           NULL);


SELECT p.post_text,
       SUBSTRING(
                  p.body,
                  CHARINDEX(N'](', p.body) + 2,
                  CHARINDEX(N')', p.body) - (CHARINDEX(N'](', p.body) + 2)
                ) AS [URL]
FROM   @posts p
WHERE  p.body like '%\[%](http%)%' ESCAPE '\';

输出:

post_text  URL
first      http://google.com
second     http://www.apple.com
third      http://example.com

附注:
如果您真的想使用正则表达式,则只能通过 SQLCLR 来完成.您可以编写自己的或下载预先完成的库.我写了一个这样的库,SQL#,它有一个包含 RegEx 函数的免费版本.但是,只有在找不到 T-SQL 解决方案时才应使用这些解决方案,到目前为止,情况并非如此.

PS:
If you really want to use Regular Expressions, they can only be done via SQLCLR. You can write your own or download pre-done libraries. I wrote one such library, SQL#, that has a Free version that includes the RegEx functions. But those should only be used if a T-SQL solution cannot be found, which so far is not the case here.

这篇关于SQL Server 2012:提取正则表达式组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆