普通EX pression查找文本超链接不属于 [英] Regular expression to find text not part of a hyperlink
问题描述
我试图找到一个正规的前pression,我可以用它来解析HTML块找到一些具体的文字,但只有在该文本不是现有超链接的一部分。我想将非链接为链接,这很容易,但确定非链接那些跟单前pression似乎更麻烦。在下面的示例:
I'm trying to find a single regular expression that I can use to parse a block of HTML to find some specific text, but only if that text is not part of an existing hyperlink. I want to turn the non-links into links, which is easy, but identifying the non-linked ones with a single expression seems more troublesome. In the following example:
This problem is a result of BugID 12.
If you want more information, refer to <a href="/bug.aspx?id=12">BugID 12</a>.
我希望有一个单一的EX pression找到错误ID 12这样我就可以联系起来,但我不想要匹配第二个,因为它已经联系。
I want a single expression to find "BugID 12" so I can link it, but I don't want to match the second one because it's already linked.
在情况下它的事项,我使用.NET的正前pressions。
In case it matters, I'm using .NET's regular expressions.
推荐答案
如果.NET支持非查询aheads(我认为它):
If .Net supports negative look aheads (which I think it does):
(BugID 12)(?!</a>) // match BugID 12 if it is not followed by a closing anchor tag.
然而,仍然存在该BUGID 12将是一个锚内部像
However, there is still the danger that BugID 12 will be inside an anchor like
<a href="...">Something BugID 12 Something</a>
但你可以多为克服这个以
But you can mostly overcome this with
(BugID 12)(?!(?:\s*\w*)*</a>) // (?:\s*\w*)* matches any word characters or spaces between the string and the end tag.
免责声明:解析HTML与正则表达式是不可靠的,只应作为最后的手段,或者在最简单的案件。我敢肯定有很多情况下上述EX pression不执行所希望的。 (例如:错误ID 12 LT; / SPAN&GT;&LT; / A&GT;
)
Disclaimer: Parsing html with regex is not reliable and should only be done as a last resort, or in the most simple of cases. I'm sure there are plenty of instances where the above expression does not perform as desired. (example: BugID 12</span></a>
)
这篇关于普通EX pression查找文本超链接不属于的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!