嵌套 XML 属性的正则表达式 [英] Regex for nested XML attributes

查看:71
本文介绍了嵌套 XML 属性的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有以下字符串:

"<aa v={<dd>sop</dd>} z={ <bb y={ <cc x={st}>ABC</cc> }></bb> }></aa>"

如何编写通用正则表达式(标记名称更改、属性名称更改)以匹配 {} 中的内容,或者

sop
ABC}></bb>.

我写的正则表达式 "(\s*\w*=\s*\{)\s*(<.*>)\s*(\})" 匹配>

"

sop
} z={ <bby={ <cc x={st}>ABC</cc> }></bb>" 这是不正确的.

解决方案

在通用正则表达式中,没有办法很好地处理嵌套.因此,当出现这样的问题时,所有的胜利 - 永远不要使用正则表达式来解析 XML/HTML.

在一些简单的情况下,这可能是有利的.如果像您的示例一样,嵌套层数有限,您可以非常简单地为每一层添加一个正则表达式.

现在让我们分步进行.要处理您可以使用的第一个非嵌套属性

{[^}]*}

这匹配一个起始大括号后跟任意数量的任何东西一个右大括号,最后是一个右大括号.为简单起见,我将把它的核心放在一个非捕获组中,例如

{(?:[^}])*}

这是因为在插入备用的时,它是必需的.

如果您现在允许除了右大括号之外的任何东西 ([^}]) 也是另一个嵌套的大括号,并且只需加入第一个正则表达式,如

{(?:{[^}]*}|[^}])*}^^^^^^^ 原始正则表达式作为替代插入(对其自身)

它允许一层嵌套.再次做同样的事情,加入这个正则表达式作为其自身的替代,比如

{(?:{(?:{[^}]*}|[^}])*}|{[^}]*}|[^}])*}^^^^^^^^^^^^^^^ 上一层重复

将允许另一个级别的嵌套.如果需要,可以在更多级别上重复此操作.

虽然这并不能处理属性名称和内容的捕获,因为您的问题不太清楚您想要什么,但它向您展示了一种方式(imo最容易理解,或者...:P) 来处理正则表达式中的嵌套.

您可以在 regex101 上看到它处理您的示例.

问候

Lets say I have following string:

"<aa v={<dd>sop</dd>} z={ <bb y={ <cc x={st}>ABC</cc> }></bb> }></aa>"

How can I write general purpose regex (tag names change, attribute names change) to match content inside {}, either <dd>sop</dd> or <bb y={ <cc x={st}>ABC</cc> }></bb>.

Regex I wrote "(\s*\w*=\s*\{)\s*(<.*>)\s*(\})" matches

"<dd>sop</dd>} z={ <bb y={ <cc x={st}>ABC</cc> }></bb>" which is not correct.

解决方案

In generic regex there's no way to handle nesting in a good way. Hence all the wining when a question like this comes up - never use regex to parse XML/HTML.

In some simple cases it might be advantageous though. If, like in your example, there's a limited number of levels of nesting, you can quite simply add one regex for each level.

Now let's do this in steps. To handle the first un-nested attribute you can use

{[^}]*}

This matches a starting brace followed by any number of anything but a closing brace, finally followed by a closing brace. For simplicity I'm gonna put the heart of it in a non capturing group, like

{(?:[^}])*}

This is because when inserting the alternate ones, it's needed.

If you now allow for that anything but a closing brace ([^}]) to also be another nested level of braces and simply join with the first regex, like

{(?:{[^}]*}|[^}])*}
    ^^^^^^^    original regex inserted as alternative (to it self)

it allows for one level of nesting. Doing the same again, joining this regex as an alternative to itself, like

{(?:{(?:{[^}]*}|[^}])*}|{[^}]*}|[^}])*}
        ^^^^^^^^^^^^^^^    previous level repeated

will allow for another level of nesting. This can be repeated for more levels if wanted.

This doesn't handle the capture of attribute names and stuff though, because your question isn't quite clear on what you want there, but it shows you one way (i.m.o. the easiest to understand, or... :P) to handle nesting in regex.

You can see it handle your example here at regex101.

Regards

这篇关于嵌套 XML 属性的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆