正则表达式 - 匹配HTML代码中的属性 [英] Regex - Match attribute in a HTML code

查看:144
本文介绍了正则表达式 - 匹配HTML代码中的属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用正则表达式匹配html属性(在各种html标签中)时遇到问题。为此,我使用下列模式:

  myAttr = \([^'] *)\

HTML片段:

 < img alt =src =1-p2.jpgmyAttr =http://example.comclass =alignleft/> 

它从 myAttr 中选择文本结束 /> 但我需要选择 myAttr =... http://example.com

解决方案

你的角色类中有一个撇号('),但你想要一个报价()。 p>

  myAttr = \([^] *)\
pre>

也就是说,你真的不应该用正则表达式解析HTML 。 (对不起,请再次链接到 that 的答案,还有其他答案,更多的是如果你知道自己在做什么...的变体,但是很好意识到。)注意,即使你将regexing限制为只有属性,你也需要考虑很多:


  • 小心不要在注释内部匹配

  • 小心不要在CDATA部分内匹配

  • 如果属性用单引号括起来而不是双引号?
  • 如果属性根本没有引号,该怎么办?



这就是为什么通常需要预先构建的严肃解析器。


I have problem with matching the html attributes (in a various html tags) with regex. To do so, I use the pattern:

myAttr=\"([^']*)\"

HTML snippet:

<img alt="" src="1-p2.jpg" myAttr="http://example.com" class="alignleft" />

it selects text from the myAttr the end /> but I need to select the myAttr="..." ("http://example.com")

解决方案

You have an apostrophe (') inside your character class but you wanted a quote (").

myAttr=\"([^"]*)\"

That said, you really shouldn't be parsing HTML with regexes. (Sorry to link to that answer again. There are other answers to that question that are more of the "if you know what you are doing..." variety. But it is good to be aware of.)

Note that even if you limit your regexing to just attributes you have a lot to consider:

  • Be careful not to match inside of comments.
  • Be careful not to match inside of CDATA sections.
  • What if attributes are bracketed with single quotes instead of double quotes?
  • What if attributes have no quotes at all?

This is why pre-built, serious parsers are generally called for.

这篇关于正则表达式 - 匹配HTML代码中的属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆