提取标签属性的正则表达式 [英] Regular expression for extracting tag attributes

查看:82
本文介绍了提取标签属性的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图提取锚标记的属性(< a> )。到目前为止,我有这样的表达:

 (?< name> \b\w + \b)\s * = \s *( (小于值GT; [^?] *) | '(小于?值GT; [^'] *)'|(小于?值GT; [^'<> ; \s] +)\s *)+ 

适用于像

 < a href =test.htmlclass =xyz> 

和(单引号)

 < a href ='test.html'class =xyz> 

但不适用于不带引号的字符串:

 < a href = test.html class = xyz> 

如何修改我的正则表达式使其能够使用不带引号的属性?或者有更好的方法可以做到这一点?



谢谢!

更新: 到目前为止,感谢所有好评和建议。有一件事我没有提到:我可悲的是必须修补/修改不是由我自己编写的代码。并且没有时间/金钱来自下而上地重写这些东西。 如果你有一个像这样的元素

 < name attribute = value attribute =valueattribute ='value'> 

这个正则表达式可以用来连续查找每个属性名称和值。

 (\S +)= [']?((?:。(?!['']?\ s +(?: \ S +) = | [> 中']))+)。[']? 

应用于:

 < a href = test.html class = xyz> 
< a href =test.htmlclass =xyz>
< a href ='test.html'class =xyz>

会产生:

 'href'=> 'test.html'
'class'=> 'xyz'


I'm trying to extract the attributes of a anchor tag (<a>). So far I have this expression:

(?<name>\b\w+\b)\s*=\s*("(?<value>[^"]*)"|'(?<value>[^']*)'|(?<value>[^"'<> \s]+)\s*)+

which works for strings like

<a href="test.html" class="xyz">

and (single quotes)

<a href='test.html' class="xyz">

but not for string without quotes:

<a href=test.html class=xyz>

How can I modify my regex making it work with attributes without quotes? Or is there a better way to do that?

Thanks!

Update: Thanks for all the good comments and advices so far. There is one thing I didn't mention: I sadly have to patch/modify code not written by myself. And there is no time/money to rewrite this stuff from bottom up.

解决方案

If you have an element like

<name attribute=value attribute="value" attribute='value'>

this regex could be used to find successively each attribute name and value

(\S+)=["']?((?:.(?!["']?\s+(?:\S+)=|[>"']))+.)["']?

Applied on:

<a href=test.html class=xyz>
<a href="test.html" class="xyz">
<a href='test.html' class="xyz">

it would yield:

'href' => 'test.html'
'class' => 'xyz'

这篇关于提取标签属性的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆