普通EX pression查找文本超链接不属于 [英] Regular expression to find text not part of a hyperlink

查看:153
本文介绍了普通EX pression查找文本超链接不属于的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找到一个正规的前pression,我可以用它来解析HTML块找到一些具体的文字,但只有在该文本不是现有超链接的一部分。我想将非链接为链接,这很容易,但确定非链接那些跟单前pression似乎更麻烦。在下面的示例:

I'm trying to find a single regular expression that I can use to parse a block of HTML to find some specific text, but only if that text is not part of an existing hyperlink. I want to turn the non-links into links, which is easy, but identifying the non-linked ones with a single expression seems more troublesome. In the following example:

  This problem is a result of BugID 12.
  If you want more information, refer to <a href="/bug.aspx?id=12">BugID 12</a>.

我希望有一个单一的EX pression找到错误ID 12这样我就可以联系起来,但我不想要匹配第二个,因为它已经联系。

I want a single expression to find "BugID 12" so I can link it, but I don't want to match the second one because it's already linked.

在情况下它的事项,我使用.NET的正前pressions。

In case it matters, I'm using .NET's regular expressions.

推荐答案

如果.NET支持非查询aheads(我认为它):

If .Net supports negative look aheads (which I think it does):

(BugID 12)(?!</a>)  // match BugID 12 if it is not followed by a closing anchor tag.

然而,仍然存在该BUGID 12将是一个锚内部像

However, there is still the danger that BugID 12 will be inside an anchor like

<a href="...">Something BugID 12 Something</a>

但你可以多为克服这个以

But you can mostly overcome this with

(BugID 12)(?!(?:\s*\w*)*</a>)  // (?:\s*\w*)* matches any word characters or spaces between the string and the end tag.

免责声明:解析HTML与正则表达式是不可靠的,只应作为最后的手段,或者​​在最简单的案件。我敢肯定有很多情况下上述EX pression不执行所希望的。 (例如:错误ID 12 LT; / SPAN&GT;&LT; / A&GT;

Disclaimer: Parsing html with regex is not reliable and should only be done as a last resort, or in the most simple of cases. I'm sure there are plenty of instances where the above expression does not perform as desired. (example: BugID 12</span></a>)

这篇关于普通EX pression查找文本超链接不属于的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆