我在正则表达式代码中找不到错误 [英] I can not find the error in my Regex code
问题描述
我想用preg_match_all获取html中的类和数据属性。
With preg_match_all I want to get class and data-attributes in html.
我之前问过类似的问题。 DOM是对先前职责的正确答案。但是作为DOM结构的替代方法,我还需要一个正则表达式版本。
I asked a similar question before. The correct answer to the previous responsibility was done with DOM. But as an alternative to the DOM structure, I also need a regex version.
该模式可以正常工作。但是,如果这些行是并排的,它们还将从不应接受的标记中获取类名。
The pattern works fine. However, if the lines are side-by-side, they also take class names from tags that should not be accepted.
<div class="noproblem">
<ul class="noproblem" data-ss="1">
<li class="noproblem" data-ss="1">
<!-- <i> is not my tag. but there s no problem with that. because it s underneath . -->
<i class="no_problem"></i>
</li>
</ul>
</div>
<div class="noproblem" data-ss"1"> <!-- problem: data-ss is not accepted -->
<ul class="noproblem" data-ss="1">
<!-- <i> is not my tag. my tags: div|ul|li . -->
<li class="noproblem"><i class="this_is_problem"></i>
</li>
</ul>
</div>
<div class="noproblem">
<ul class="noproblem">
<!-- <i> is not my tag. my tags: div|ul|li . -->
<li class="noproblem"><i class="this_is_problem"></i>
</li>
<!-- <span> is not my tag. my tags: div|ul|li . -->
<li class="test"><span class="this_is_problem"></span></li>
<!-- (li class empty version): <span> is not my tag. my tags: div|ul|li . -->
<li><span class="this_is_problem"></span></li>
</ul>
</div>
正则表达式模式:
$pattern = '/<(?:div|ul|li)(?:.*?(?:class|data-ss)="([^"]+)")?(?:.*?(?:class|data-ss)="([^"]+)")?[^>]*>/';
示例和问题: https://regex101.com/r/vSIsac/5
替代来源(我的老问题):< a href = https://stackoverflow.com/a/51778865/6320082> https://stackoverflow.com/a/51778865/6320082
Alternative source (my old question): https://stackoverflow.com/a/51778865/6320082
推荐答案
如果您确实需要使用正则表达式,请尝试以下操作:
If you really need to use regexes, try with this:
<(?? div | ul | li)(?= [^>] * \bclass =([^] +))(?=(?:[^>] * \bdata-\w + =( [^] +))?)
您将获得第一个绑架组的班级价值( $ 1
)和第二个捕获组( $ 2
)上的数据值(如果存在)
You'll get class value on first captusing group ($1
) and data value (if exists) on second capturing group ($2
)
Demo
解释:
<(?:div|ul|li) # div or ul or li tag
# Lookahead expressions:
# find any character not '>' repeated any times, then class
(?= # lookahead
[^>]*\bclass="([^"]+)"
)
# find any character not '>' repeated any times, then data
# Since this is optional, we make the whole expression optional with ?
(?=
(?:
[^>]*\bdata-\w+="([^"]+)"
)? # optional
)
这篇关于我在正则表达式代码中找不到错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!