正则表达式:匹配不在HTML标记内的especific字符串 [英] RegEx: Matching a especific string that is not inside in HTML tag

查看:77
本文介绍了正则表达式:匹配不在HTML标记内的especific字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 < tag value ='botafogo'>botafogo是最好的</tag> 

需求只匹配botafogo(...是最好的),而不匹配"botafogo"值

我的程序自动用纯文本注释"该术语:

  botafogo是最好的至< team attr ='best'> botafogo</team>是最好的 

当我全部替换"最佳"一词时,我遇到了一个大问题...

 < team attr ='<形容词>/最佳形容词/" botafogo</team>是<形容词>最佳<形容词>. 

Ps .: Java语言

解决方案

完成此操作的最佳方法是不使用正则表达式,而使用正确的HTML解析器.HTML不是一种正则语言,使用正则表达式执行此操作将很繁琐,难以维护,而且很可能仍会包含各种错误.

一方面,HTML解析器非常适合这项工作.他们中的许多人都是成熟可靠的,他们会为您处理所有小细节,并使您的生活更加轻松.

<tag value='botafogo'> botafogo is the best </tag>

Needs match only botafogo (...is the best) and not 'botafogo' value

my program "annotates" automatically the term in a pure text:

botafogo is the best 

to

<team attr='best'>botafogo</team> is the best 

and when i "replace all" the "best" word, i have a big problem...

<team attr='<adjective>best</adjective>'>botafogo</team> is the <adjective>best</adjective>

Ps.: Java language

解决方案

The best way to accomplish this is to NOT use regular expression and use a proper HTML parser. HTML is not a regular language and doing this with regular expression will be tedious, hard to maintain, and more than likely still contain various errors.

HTML parsers, on the hand, are well-suited for the job. Many of them are mature and reliable, and they take care of every little details for you and makes your life much easier.

这篇关于正则表达式:匹配不在HTML标记内的especific字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆