我在正则表达式代码中找不到错误 [英] I can not find the error in my Regex code

查看:123
本文介绍了我在正则表达式代码中找不到错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用preg_match_all获取html中的类和数据属性。

With preg_match_all I want to get class and data-attributes in html.

我之前问过类似的问题。 DOM是对先前职责的正确答案。但是作为DOM结构的替代方法,我还需要一个正则表达式版本。

I asked a similar question before. The correct answer to the previous responsibility was done with DOM. But as an alternative to the DOM structure, I also need a regex version.

该模式可以正常工作。但是,如果这些行是并排的,它们还将从不应接受的标记中获取类名。

The pattern works fine. However, if the lines are side-by-side, they also take class names from tags that should not be accepted.

<div class="noproblem"> 
    <ul class="noproblem" data-ss="1">
        <li class="noproblem" data-ss="1">
            <!-- <i> is not my tag. but there s no problem with that. because it s underneath . -->
            <i class="no_problem"></i>
        </li>
    </ul>
</div>

<div class="noproblem" data-ss"1">  <!-- problem: data-ss is not accepted -->
    <ul class="noproblem" data-ss="1">
        <!-- <i> is not my tag. my tags:  div|ul|li . -->
        <li class="noproblem"><i class="this_is_problem"></i>
        </li>
    </ul>
</div>

<div class="noproblem">
    <ul class="noproblem">
        <!-- <i> is not my tag. my tags:  div|ul|li . -->
        <li class="noproblem"><i class="this_is_problem"></i>
        </li>
        <!-- <span> is not my tag. my tags:  div|ul|li . -->
        <li class="test"><span class="this_is_problem"></span></li>
        <!-- (li class empty version): <span> is not my tag. my tags:  div|ul|li . -->
        <li><span class="this_is_problem"></span></li>
    </ul>
</div>

正则表达式模式:

$pattern = '/<(?:div|ul|li)(?:.*?(?:class|data-ss)="([^"]+)")?(?:.*?(?:class|data-ss)="([^"]+)")?[^>]*>/'; 

示例和问题: https://regex101.com/r/vSIsac/5

替代来源(我的老问题):< a href = https://stackoverflow.com/a/51778865/6320082> https://stackoverflow.com/a/51778865/6320082

Alternative source (my old question): https://stackoverflow.com/a/51778865/6320082

推荐答案

如果您确实需要使用正则表达式,请尝试以下操作:

If you really need to use regexes, try with this:

<(?? div | ul | li)(?= [^>] * \bclass =([^] +))(?=(?:[^>] * \bdata-\w + =( [^] +))?)

您将获得第一个绑架组的班级价值( $ 1 )和第二个捕获组( $ 2 )上的数据值(如果存在)

You'll get class value on first captusing group ($1) and data value (if exists) on second capturing group ($2)

演示

Demo

解释:

<(?:div|ul|li)  # div or ul or li tag

 # Lookahead expressions:

 # find any character not '>' repeated any times, then class
 (?= # lookahead
    [^>]*\bclass="([^"]+)"
 )  

 # find any character not '>' repeated any times, then data
 # Since this is optional, we make the whole expression optional with ?
 (?=
    (?:
        [^>]*\bdata-\w+="([^"]+)"
    )? # optional
 )

这篇关于我在正则表达式代码中找不到错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆