RegEx:如果内部引号不匹配某个字符 [英] RegEx: Don't match a certain character if it's inside quotes

查看:163
本文介绍了RegEx:如果内部引号不匹配某个字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

披露:我已阅读此答案很多时候,在这里,我知道比使用正则表达式解析HTML更好。这个问题只是用正则表达式来扩大我的知识。

Disclosure: I have read this answer many times here on SO and I know better than to use regex to parse HTML. This question is just to broaden my knowledge with regex.

说我有这个字符串:

some text <tag link="fo>o"> other text

我想匹配整个标签,但如果我使用 [^]] +> 它只匹配< tag link =fo>

I want to match the whole tag but if I use <[^>]+> it only matches <tag link="fo>.

我如何确保引号内的> 可以被忽略。

How can I make sure that > inside of quotes can be ignored.

我可以简单地用while循环写一个解析器来做到这一点,但是我想知道如何用正则表达式来做。

I can trivially write a parser with a while loop to do this, but I want to know how to do it with regex.

推荐答案

表达式:



Regular Expression:

<[^>]*?(?:(?:('|")[^'"]*?\1)[^>]*?)*>



在线演示:



http://regex101.com/r/ yX5xS8

我知道这个正则表达式可能是头痛的,所以这是我的解释:

I know this regex might be a headache to look at, so here is my explanation:

<                      # Open HTML tags
    [^>]*?             # Lazy Negated character class for closing HTML tag
    (?:                # Open Outside Non-Capture group
        (?:            # Open Inside Non-Capture group
            ('|")      # Capture group for quotes, backreference group 1
            [^'"]*?    # Lazy Negated character class for quotes
            \1         # Backreference 1
        )              # Close Inside Non-Capture group
        [^>]*?         # Lazy Negated character class for closing HTML tag
    )*                 # Close Outside Non-Capture group
>                      # Close HTML tags

这篇关于RegEx:如果内部引号不匹配某个字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆