正则表达式,用于查找没有特定属性(例如&“ id&”)的元素 [英] Regex for finding elements without a certain attribute (e.g., "id")

查看:100
本文介绍了正则表达式,用于查找没有特定属性(例如&“ id&”)的元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在JSF项目中仔细检查大量基于XML的文件,并希望找到某些缺少ID属性的组件。例如,假设我要查找所有指定了id属性的< h:inputText /> 元素。

I'm scrubbing through a large number of XML based files in a JSF project, and would like to find certain components that are missing an ID attribute. For example, let's say I want to find all of the <h:inputText /> elements that do not have an id-attribute specified.

我已经在RAD(Eclipse)中尝试了以下方法,但是有些不正确,因为我仍然得到确实具有一些的组件。有效的ID。

I've tried the following in RAD (Eclipse), but something's not quite right because I still get some components that do have a valid ID.

<([hf]|ig):(?!output)\w+\s+(?!\bid\b)[^>]*?\s+(?!\bid\b)[^>]*?>

不确定我的否定前瞻是否正确?

Not sure if my negative-lookahead is correct or not?

期望的结果是,我将在项目中的任何JSP中找到以下内容(或类似内容):

The desired result would be that I would find the following (or similar) in any JSP in the project:

<h:inputText value="test" />

...但不是

<h:inputText id="good_id" value="test" />

我只是在使用< h:inputText /> 为例。我试图扩大范围,但绝对排除< h:outputText />

I'm just using <h:inputText/> as an example. I was trying to be broader than that, but definitely excluding <h:outputText/>.

推荐答案

免责声明:



正如其他人正确指出的那样,在使用非常规标记语言(例如)时,最好使用专用的解析器。 XML / HTML。正则表达式解决方案因误报或匹配失败而失败的方法有很多。

Disclaimer:

As others correctly point out, it is best to use a dedicated parser when working with non-regular markup languages such as XML/HTML. There are many ways for a regex solution to fail with either false positives or missed matches.

此特定问题是单发编辑问题,并且目标文本(开放标记)不是嵌套结构。尽管以下正则表达式解决方案有多种方法可以使它失败,但是它仍然应该做得很好。

This particular problem is a one-shot editing problem and the target text (an open tag) is not a nested structure. Although there are ways for the following regex solution to fail, it should still do a pretty good job.

我不知道Eclipse的正则表达式语法,但是如果它提供负值的话向前看,下面是一个正则表达式解决方案,它将与不具有ID属性的特定目标元素列表匹配:(首先,以PHP / PCRE自由空间模式显示,注释语法以提高可读性)

I don't know Eclipse's regex syntax, but if it provides negative lookahead, the following is a regex solution that will match a list of specific target elements which do not have an ID attribute: (First, presented in PHP/PCRE free-spacing mode commented syntax for readability)

$re_open_tags_with_no_id_attrib = '%
    # Match specific element open tags having no "id" attribute.
    <                    # Literal "<" start of open tag.
    (?:                  # Group of target element names.
      h:inputText        # Either h:inputText element,
    | h:otherTag         # or h:otherTag element,
    | h:anotherTag       # or h:anotherTag element.
    )                    # End group of target element names.
    (?:                  # Zero or more open tag attributes.
      \s+                # Whitespace required before each attribute.
      (?!id\b)           # Assert this attribute not named "id".
      [\w\-.:]+          # Non-"id" attribute name.
      (?:                # Group for optional attribute value.
        \s*=\s*          # Value separated by =, optional ws.
        (?:              # Group of attrib value alternatives.
          "[^"]*"        # Either double quoted value,
        | \'[^\']*\'     # or single quoted value,
        | [\w\-.:]+      # or unquoted value.
        )                # End group of value alternatives.
      )?                 # Attribute value is optional.
    )*                   # Zero or more open tag attributes.
    \s*                  # Optional whitespace before close.
    /?                   # Optional empty tag slash before >.
    >                    # Literal ">" end of open tag.
    %x';

这是裸机本机格式的正则表达式,可能适合复制并粘贴到Eclipse搜索框:

And here is the same regex in bare-bones native format which may be suitable for copy and paste into an Eclipse search box:

<(?: h:inputText | h:otherTag | h:anotherTag)(?: \s +( ?!id\b)[\w\-.:] +(?: \s * = \s *(?: [^] * |'[^'] *'| [ \w\ -.::+))?)* \s * /?>

请注意目标组要在此表达式的开头匹配的元素名称。您可以在此ORed列表中添加或减去所需的目标元素。还请注意,此表达式设计用于HTML和XML(可能具有无值属性) ,不带引号的属性值和带引号的包含<> 尖括号的属性值)。

Note the group of target element names to be matched at the beginning of this expression. You can add or subtract desired target elements to this ORed list. Note also that this expression is designed to work pretty well for HTML as well as XML (which may have value-less attributes, unquoted attribute values and quoted attribute values containing <> angle brackets).

这篇关于正则表达式,用于查找没有特定属性(例如&amp;“ id&amp;”)的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆