正则表达式匹配可能包含嵌入式JSTL的空白HTML标记? [英] Regular expression to match empty HTML tags that may contain embedded JSTL?

查看:132
本文介绍了正则表达式匹配可能包含嵌入式JSTL的空白HTML标记?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图构造一个正则表达式来查找可能嵌入JSTL的空html标签。我使用Perl进行匹配。



到目前为止,我可以将任何不包含JSTL的空html标签与以下内容匹配?

  / <\ w + \ b(?!:)[^ <* *>> \ s *< \ / \ w + / si 

\b(?!:)会避免匹配开头JTSL标签,但是没有说明JSTL是否可以在HTML标签内(这是允许的)。我只想知道这个HTML标签是否没有孩子(只有空白或空白)。所以我正在寻找一种可以匹配以下两种模式:

 < div id =my-id> ; 
< / div>
< div class =< c:out var =$ {my.property}/>>< / div>

目前第一个div匹配。第二个不是。它可行吗?我用前瞻断言尝试了几个变体,我开始认为它不是。 然而,我不能肯定地说,或者说清楚为什么它不是。

编辑:我不是写一些东西来解释代码,我对使用解析器不感兴趣。我正在写一个脚本来指出潜在的问题/疏忽。此时,我也很好奇,以查看是否有可能缺少的lookahead或lookbehinds的内容。如果它困扰你,我试图用这种方式解决一个问题,不要把它看作是寻求解决方案。对我来说,现在这是一个更大的挑战,也是一个了解更多正则表达式的机会。

解决方案

试试

  ≤(\w +)(?: \s + \w + = [^] +(?: \ $ [^] + [^] +)?)* GT; \ s *< / \ 1> 

一个简短的解释:

 <#match a'<'
(\ w +)#匹配一个或多个az,AZ,0-9或'_'并将其存储在组1中
(?:#打开不匹配组1
\ s +#匹配一个或多个空格字符
\ w +#匹配一个或多个az, AZ,0-9或'_'
=#match'='
[^] +#匹配除'''之外的一个或多个字符
(?:#open不匹配组2
\ $#match'$'
[^] +#匹配除'''之外的一个或多个字符
#match''
[^] +#匹配除'''之外的一个或多个字符
)##关闭不匹配-g roup 2,并使其成为可选
#match''
)*#关闭非匹配组1,并重复其自身零次或多次
> #match'>'
\s *#匹配零个或多个空格字符
< / \ 1> #match'< / X>'其中`X`是组1中捕获的内容

适用于你的两个例子,但我相信有人可以构造你想匹配的html,但不会被正则表达式匹配。但在阅读完编辑之后,似乎你已经意识到了这一点。


I'm trying to construct a regular expression to look for empty html tags that may have embedded JSTL. I'm using Perl for my matching.

So far I can match any empty html tag that does not contain JSTL with the following?

/<\w+\b(?!:)[^<]*?>\s*<\/\w+/si

The \b(?!:) will avoid matching an opening JTSL tag but that doesn't address the whether JSTL may be within the HTML tag itself (which is allowable). I only want to know if this HTML tag has no children (only whitespace or empty). So I'm looking for a pattern that would match both the following:

<div id="my-id"> 
</div>
<div class="<c:out var="${my.property}" />"></div>

Currently the first div matches. The second does not. Is it doable? I tried several variations using lookahead assertions, and I'm starting to think it's not. However, I can't say for certain or articulate why it's not.

Edit: I'm not writing something to interpret the code, and I'm not interested in using a parser. I'm writing a script to point out potential issues/oversights. And at this point, I'm curious, too, to see if there is something clever with lookaheads or lookbehinds that I may be missing. If it bothers you that I'm trying to "solve" a problem this way, don't think of it as looking for a solution. To me it's more of a challenge now, and an opportunity to learn more about regular expressions.

Also, if it helps, you can assume that the html is xhtml strict.

解决方案

Try

<(\w+)(?:\s+\w+="[^"]+(?:"\$[^"]+"[^"]+)?")*>\s*</\1>

A short explanation:

<            # match a '<'
(\w+)        # match one or more a-z, A-Z, 0-9 or '_' and store it in group 1 
(?:          # open non-matching-group 1
  \s+        #   match one or more white space characters 
  \w+        #   match one or more a-z, A-Z, 0-9 or '_'
  ="         #   match '="'
  [^"]+      #   match one or more characters other than '"'
  (?:        #   open non-matching-group 2
    "\$      #     match '"$'
    [^"]+    #     match one or more characters other than '"'
    "        #     match '"'
    [^"]+    #     match one or more characters other than '"'
  )?         #   close non-matching-group 2, and make it optional
  "          #   match '"'
)*           # close non-matching-group 1, and make repeat itself zero or more times
>            # match '>'
\s*          # match zero or more white space characters
</\1>        # match '</X>' where `X` is what is captured in group 1

This works for both you examples but I am sure someone can construct html that you want to match but will not be matched by the regex. But after reading your 'edit', it seems you are aware of that.

这篇关于正则表达式匹配可能包含嵌入式JSTL的空白HTML标记?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆