正则表达式:匹配不在HTML标记内的especific字符串 [英] RegEx: Matching a especific string that is not inside in HTML tag
问题描述
< tag value ='botafogo'>botafogo是最好的</tag>
需求只匹配botafogo(...是最好的),而不匹配"botafogo"值
我的程序自动用纯文本注释"该术语:
botafogo是最好的至< team attr ='best'> botafogo</team>是最好的
当我全部替换"最佳"一词时,我遇到了一个大问题...
< team attr ='<形容词>/最佳形容词/" botafogo</team>是<形容词>最佳<形容词>.
Ps .: Java语言
完成此操作的最佳方法是不使用正则表达式,而使用正确的HTML解析器.HTML不是一种正则语言,使用正则表达式执行此操作将很繁琐,难以维护,而且很可能仍会包含各种错误.
一方面,HTML解析器非常适合这项工作.他们中的许多人都是成熟可靠的,他们会为您处理所有小细节,并使您的生活更加轻松.<tag value='botafogo'> botafogo is the best </tag>
Needs match only botafogo (...is the best) and not 'botafogo' value
my program "annotates" automatically the term in a pure text:
botafogo is the best
to
<team attr='best'>botafogo</team> is the best
and when i "replace all" the "best" word, i have a big problem...
<team attr='<adjective>best</adjective>'>botafogo</team> is the <adjective>best</adjective>
Ps.: Java language
The best way to accomplish this is to NOT use regular expression and use a proper HTML parser. HTML is not a regular language and doing this with regular expression will be tedious, hard to maintain, and more than likely still contain various errors.
HTML parsers, on the hand, are well-suited for the job. Many of them are mature and reliable, and they take care of every little details for you and makes your life much easier.
这篇关于正则表达式:匹配不在HTML标记内的especific字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!