正则表达式匹配具有特定属性的 html 标签 [英] regex to match html tags with specific attributes

查看:207
本文介绍了正则表达式匹配具有特定属性的 html 标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试匹配所有没有term"或range"属性的 HTML 标签

I am trying to match all HTML tags that do not have the attribute "term" or "range"

这里是示例 HTML 格式

here is sample HTML format

<span class="inline prewrap strong">DATE:</span>    12/01/10
<span class="inline prewrap strong">MR:</span>  1234567
<span class="inline prewrap strong">DOB:</span> 12/01/65
<span class="inline prewrap strong">HISTORY OF PRESENT ILLNESS:</span>  Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum

<span class="inline prewrap strong">MEDICATIONS:</span>  <span term="Advil" range="true">Advil </span>and Ibuprofen.

我的正则表达式是:<(.*?)((?!\bterm\b).)>

不幸的是,这匹配了所有标签......如果内部文本不匹配,那就太好了,因为我需要过滤掉除具有该特定属性的标签之外的所有标签.

Unfortunately this matches all the tags...It would be nice if the inner text wouldn't be matched as i need to filter out all the tags except the ones with that specific attribute.

推荐答案

如果正则表达式适合您,这对我有用.(注意 - 不包括过滤掉评论、文档类型和其他实体.
其他警告;标签可以嵌入到脚本、评论和其他东西中.)

If regex is your thing for this, this works for me. (Note - filterring out comments, doctype and other entities is not included.
Other warnings; tags could be embeded in script, comments and other things.)

span 标签(w/attr)没有术语|范围属性

span tag (w/ attr) no term|range attrs

'<span
  (?=\s)
  (?! (?:[^>"\']|(?>".*?"|\'.*?\'))*? (?<=\s) (?:term|range) \s*= )
  \s+ (?:".*?"|\'.*?\'|[^>]*?)+ 
>'

任何标签(w/attr)没有术语|范围属性

any tag (w/ attr) no term|range attrs

'<[A-Za-z_:][\w:.-]*
  (?=\s)
  (?! (?:[^>"\']|(?>".*?"|\'.*?\'))*? (?<=\s) (?:term|range) \s*= )
  \s+ (?:".*?"|\'.*?\'|[^>]*?)+ 
>'

任何标签(无属性)无术语|范围属性

any tag (w/o attr) no term|range attrs

'<
  (?:
    [A-Za-z_:][\w:.-]*
    (?=\s)
    (?! (?:[^>"\']|(?>".*?"|\'.*?\'))*? (?<=\s) (?:term|range) \s*= )
    \s+ (?:".*?"|\'.*?\'|[^>]*?)+ 
  |
    /?[A-Za-z_:][\w:.-]*\s*/?
  )
>'

更新

替代使用 (?>) 构造
下面的正则表达式是用于 no-'term|range'-attributes
标志 = (g)global 和 (s)dotall

Alternative to using (?>) construct
Below regex's are for no-'term|range'-attributes
Flags = (g)global and (s)dotall

带属性的span标签
链接:http://regexr.com?2vrjr
正则表达式:<span(?=\s)(?!(?:[^>"\']|"[^"]*"|\'[^\']*\')*?(?<=\s)(?:term|range)\s*=)(?!\s*/?>)\s+(?:".*?"|\'.*?\'|[^>]*?)+>

任何带有属性的标签
链接:http://regexr.com?2vrju
正则表达式:<[A-Za-z_:][\w:.-]*(?=\s)(?!(?:[^>"\']|"[^"]*"|\'[^\']*\')*?(?<=\s)(?:term|range)\s*=)(?!\s*/?>)\s+(?:".*?"|\'.*?\'|[^>]*?)+>

任何标签 w/attr 或 wo/attr
链接:http://regexr.com?2vrk1
正则表达式:<(?:[A-Za-z_:][\w:.-]*(?=\s)(?!(?:[^>"\']|"[^"]*"|\'[^\']*\')*?(?<=\s)(?:term|range)\s*=)(?!\s*/?>)\s+(?:".*?"|\'.*?\'|[^>]*?)+|/?[A-Za-z_:][\w:.-]*\s*/?)>

'匹配每个标签,除了那些有 term="occasionally"'

链接:http://regexr.com?2vrka
<(?:[A-Za-z_:][\w:.-]*(?=\s)(?!(?:[^>"\']|"[^"]*"|\'[^\']*\')*?(?<=\s)term\s*=\s*(["'])\s*偶尔\s*\1)(?!\s*/?>)\s+(?:".*?"|\'.*?\'|[^>]*?)+|/?[A-Za-z_:][\w:.-]*\s*/?)>

这篇关于正则表达式匹配具有特定属性的 html 标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆